Skip to main content

Installation

Install the SDK using npm:
npm install olyptik

Configuration

First, you’ll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.
import Olyptik from 'olyptik';

// Initialize with API key
const client = new Olyptik({ apiKey: 'your-api-key' });

Usage

Starting a crawl

The SDK allows you to start web crawls with various configuration options:
const crawl = await client.runCrawl({
  startUrl: 'https://example.com',
  maxResults: 10
});

Get crawl

Retrieve a crawl - the response will be a crawl object.
const crawl = await client.getCrawl(crawl.id);

Query crawls

const result = await client.queryCrawls({
	status: "succeeded",
	page: 0
});

Get crawl results

Retrieve the results of your crawl using the crawl ID:
const results = await client.getCrawlResults(crawl.id, 0, 50);
The results are paginated, and you can specify the page number and limit per page.

Abort a crawl

const abortedCrawl = await client.abortCrawl(crawl.id);

Get crawl logs

Retrieve logs for a specific crawl to monitor its progress and debug issues:
const page = 1;
const limit = 1200;
const logs = await client.getCrawlLogs(crawl.id, page, limit);

// Iterate through logs
for (const log of logs.results) {
  console.log(`[${log.level}] ${log.message}: ${log.description}`);
}

Objects

RunCrawlPayload

PropertyTypeRequiredDefaultDescription
startUrlstring-The URL to start crawling from
maxResultsnumber❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)-Maximum number of results to collect (1-10,000)
useSitemapboolean❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)falseWhether to use sitemap.xml to crawl the website
entireWebsiteboolean❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)falseWhether to crawl the entire website
maxDepthnumber10Maximum depth of pages to crawl (1-100)
includeLinksbooleantrueWhether to include links in the crawl results’ markdown
excludeNonMainTagsbooleantrueWhether to exclude non-main tags from the crawl results’ markdown
deduplicateContentbooleantrueWhether to remove duplicate text fragments that appeared on other pages
extractionstring""Instructions defining how the AI should extract specific content from the crawl results
timeoutnumber60Timeout duration in minutes
engineTypestring“auto”The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites)
useStaticIpsbooleanfalseWhether to use static IPs for the crawl

Crawl

PropertyTypeDescription
idstringUnique crawl identifier
statusstringCurrent status (“running”, “succeeded”, “failed”, “timed_out”, “aborted”, “error”)
startUrlsstring[]Starting URLs
includeLinksbooleanWhether links are included
maxDepthnumberMaximum crawl depth
maxResultsnumberMaximum number of results
teamIdstringTeam identifier
projectIdstringProject identifier
createdAtstringCreation timestamp
completedAtstring | nullCompletion timestamp
durationInSecondsnumberTotal duration
totalPagesnumberCount of pages extracted
useSitemapbooleanWhether sitemap was used
extractionstringInstructions defining how the AI should extract specific content from the crawl results
entireWebsitebooleanWhether to crawl the entire website
excludeNonMainTagsbooleanWhether non-main tags are excluded from the crawl results’ markdown
deduplicateContentbooleanWhether to remove duplicate text fragments that appeared on other pages.
timeoutnumberThe timeout of the crawl in minutes

CrawlResult

Each crawl result includes:
PropertyTypeDescription
idstringUnique identifier for the page result
crawlIdstringUnique identifier for the crawl
projectIdstringProject identifier
urlstringThe crawled URL
titlestringPage title extracted from the HTML
markdownstringExtracted content in markdown format
processedByAIbooleanWhether the crawl result was processed by the AI
depthOfUrlnumberHow deep this URL was in the crawl (0 = start URL)
isSuccessbooleanWhether the crawl was successful
errorstringError message if the crawl failed
createdAtstringISO timestamp when the result was created

CrawlLog

Each crawl log includes:
PropertyTypeDescription
idstringUnique identifier for the log entry
messagestringLog message
level’info’ | ‘debug’ | ‘warn’ | ‘error’Log level
descriptionstringDetailed description of the log entry
crawlIdstringUnique identifier for the crawl
teamIdstring | nullTeam identifier
dataobject | nullAdditional data associated with the log
createdAtDateTimestamp when the log was created

Error Handling

The SDK throws errors for various scenarios. Always wrap your calls in try-catch blocks:
try {
  const crawl = await client.runCrawl({
    startUrl: 'https://example.com',
    maxResults: 10
  });
} catch (error) {
  console.error('Crawl failed:', error.message);
}

Next Steps

I