Skip to main content

Installation

Install the SDK using npm:
npm install olyptik

Configuration

First, you’ll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.
import Olyptik from 'olyptik';

// Initialize with API key
const client = new Olyptik({ apiKey: 'your-api-key' });

Usage

Starting a crawl

The SDK allows you to start web crawls with various configuration options:
const crawl = await client.runCrawl({
  startUrl: 'https://example.com',
  maxResults: 10
});

Get crawl

Retrieve a crawl - the response will be a crawl object.
const crawl = await client.getCrawl(crawl.id);

Query crawls

const result = await client.queryCrawls({
	status: "succeeded",
	page: 0
});

Get crawl results

Retrieve the results of your crawl using the crawl ID:
const results = await client.getCrawlResults(crawl.id, 0, 50);
The results are paginated, and you can specify the page number and limit per page.

Abort a crawl

const abortedCrawl = await client.abortCrawl(crawl.id);

Get crawl logs

Retrieve logs for a specific crawl to monitor its progress and debug issues:
const page = 1;
const limit = 1200;
const logs = await client.getCrawlLogs(crawl.id, page, limit);

// Iterate through logs
for (const log of logs.results) {
  console.log(`[${log.level}] ${log.message}: ${log.description}`);
}

Scrape multiple URLs

Scrape up to 30 URLs at once without following links:
const scrapeResponse = await client.scrape({
  urls: ['https://example.com', 'https://example.com/about']
});

for (const result of scrapeResponse.results) {
  if (result.isSuccess) {
    console.log(`URL: ${result.url}`);
    console.log(`Title: ${result.title}`);
    console.log(`Content: ${result.markdown.substring(0, 100)}...`);
  }
}

Objects

RunCrawlPayload

PropertyTypeRequiredDefaultDescription
startUrlstring-The URL to start crawling from
maxResultsnumber❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)-Maximum number of results to collect (1-10,000)
useSitemapboolean❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)falseWhether to use sitemap.xml to crawl the website
entireWebsiteboolean❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)falseWhether to crawl the entire website
maxDepthnumber10Maximum depth of pages to crawl (1-100)
includeLinksbooleantrueWhether to include links in the crawl results’ markdown
excludeNonMainTagsbooleantrueWhether to exclude non-main tags from the crawl results’ markdown
deduplicateContentbooleantrueWhether to remove duplicate text fragments that appeared on other pages
extractionstring""Instructions defining how the AI should extract specific content from the crawl results
timeoutnumber60Timeout duration in minutes
engineTypestring“auto”The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites)
useStaticIpsbooleanfalseWhether to use static IPs for the crawl

Crawl

PropertyTypeDescription
idstringUnique crawl identifier
statusstringCurrent status (“running”, “succeeded”, “failed”, “timed_out”, “aborted”, “error”)
startUrlsstring[]Starting URLs
includeLinksbooleanWhether links are included
maxDepthnumberMaximum crawl depth
maxResultsnumberMaximum number of results
teamIdstringTeam identifier
projectIdstringProject identifier
createdAtstringCreation timestamp
completedAtstring | nullCompletion timestamp
durationInSecondsnumberTotal duration
totalPagesnumberCount of pages extracted
useSitemapbooleanWhether sitemap was used
extractionstringInstructions defining how the AI should extract specific content from the crawl results
entireWebsitebooleanWhether to crawl the entire website
excludeNonMainTagsbooleanWhether non-main tags are excluded from the crawl results’ markdown
deduplicateContentbooleanWhether to remove duplicate text fragments that appeared on other pages.
timeoutnumberThe timeout of the crawl in minutes

CrawlResult

Each crawl result includes:
PropertyTypeDescription
idstringUnique identifier for the page result
crawlIdstringUnique identifier for the crawl
projectIdstringProject identifier
urlstringThe crawled URL
titlestringPage title extracted from the HTML
markdownstringExtracted content in markdown format
processedByAIbooleanWhether the crawl result was processed by the AI
depthOfUrlnumberHow deep this URL was in the crawl (0 = start URL)
isSuccessbooleanWhether the crawl was successful
errorstringError message if the crawl failed
createdAtstringISO timestamp when the result was created

CrawlLog

Each crawl log includes:
PropertyTypeDescription
idstringUnique identifier for the log entry
messagestringLog message
level’info’ | ‘debug’ | ‘warn’ | ‘error’Log level
descriptionstringDetailed description of the log entry
crawlIdstringUnique identifier for the crawl
teamIdstring | nullTeam identifier
dataobject | nullAdditional data associated with the log
createdAtDateTimestamp when the log was created

StartScrapePayload

PropertyTypeRequiredDefaultDescription
urlsstring[]-Array of URLs to scrape (max 30 URLs)
includeLinksbooleantrueWhether to include links in the scrape results’ markdown
excludeNonMainTagsbooleantrueWhether to exclude non-main tags from the scrape results’ markdown
deduplicateContentbooleantrueWhether to remove duplicate text fragments that appeared in multiple scraped pages
extractionstring""Instructions defining how the AI should extract specific content from the scrape results
timeoutnumber5Timeout duration in minutes
engineTypestring“auto”The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites)
useStaticIpsbooleanfalseWhether to use static IPs for the scrape

ScrapeResponse

The response from a scrape operation:
PropertyTypeDescription
idstringUnique scrape identifier
teamIdstringTeam identifier
projectIdstringProject identifier
resultsUrlResult[]Array of scrape results
timeoutnumberTimeout in minutes
originstringOrigin of the scrape (“api” or “web”)
createdAtDateCreation timestamp
updatedAtDateLast update timestamp

UrlResult

Each URL scrape result includes:
PropertyTypeDescription
urlstringThe URL that was scraped
isSuccessbooleanWhether the scrape was successful
titlestringPage title
markdownstringExtracted content in markdown format
linksstring[]Links found on the page
duplicatesRemovedCountnumberNumber of duplicate content blocks removed
errorCodenumberError code if the scrape failed
errorMessagestringError message if the scrape failed

Error Handling

The SDK throws errors for various scenarios. Always wrap your calls in try-catch blocks:
try {
  const crawl = await client.runCrawl({
    startUrl: 'https://example.com',
    maxResults: 10
  });
} catch (error) {
  console.error('Crawl failed:', error.message);
}

Next Steps