Node.js SDK

Installation

Install the SDK using npm:

npm install olyptik

Configuration

First, you’ll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.

import Olyptik from 'olyptik';

// Initialize with API key
const client = new Olyptik({ apiKey: 'your-api-key' });

Usage

Starting a crawl

The SDK allows you to start web crawls with various configuration options:

const crawl = await client.runCrawl({
  startUrl: 'https://example.com',
  maxResults: 10
});

Get crawl

Retrieve a crawl - the response will be a crawl object.

const crawl = await client.getCrawl(crawl.id);

Query crawls

const result = await client.queryCrawls({
	status: "succeeded",
	page: 0
});

Get crawl results

Retrieve the results of your crawl using the crawl ID:

const results = await client.getCrawlResults(crawl.id, 0, 50);

The results are paginated, and you can specify the page number and limit per page.

Abort a crawl

const abortedCrawl = await client.abortCrawl(crawl.id);

Get crawl logs

Retrieve logs for a specific crawl to monitor its progress and debug issues:

const page = 1;
const limit = 1200;
const logs = await client.getCrawlLogs(crawl.id, page, limit);

// Iterate through logs
for (const log of logs.results) {
  console.log(`[${log.level}] ${log.message}: ${log.description}`);
}

Scrape multiple URLs

Scrape up to 30 URLs at once without following links:

const scrapeResponse = await client.scrape({
  urls: ['https://example.com', 'https://example.com/about']
});

for (const result of scrapeResponse.results) {
  if (result.isSuccess) {
    console.log(`URL: ${result.url}`);
    console.log(`Title: ${result.title}`);
    console.log(`Content: ${result.markdown.substring(0, 100)}...`);
  }
}

Objects

RunCrawlPayload

Property	Type	Required	Default	Description
startUrl	string	✅	-	The URL to start crawling from
maxResults	number	❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)	-	Maximum number of results to collect (1-10,000)
useSitemap	boolean	❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)	false	Whether to use sitemap.xml to crawl the website
entireWebsite	boolean	❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)	false	Whether to crawl the entire website
maxDepth	number	❌	10	Maximum depth of pages to crawl (1-100)
includeLinks	boolean	❌	true	Whether to include links in the crawl results’ markdown
excludeNonMainTags	boolean	❌	true	Whether to exclude non-main tags from the crawl results’ markdown
deduplicateContent	boolean	❌	true	Whether to remove duplicate text fragments that appeared on other pages
extraction	string	❌	""	Instructions defining how the AI should extract specific content from the crawl results
timeout	number	❌	60	Timeout duration in minutes
engineType	string	❌	“auto”	The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites)
useStaticIps	boolean	❌	false	Whether to use static IPs for the crawl

Crawl

Property	Type	Description
id	string	Unique crawl identifier
status	string	Current status (“running”, “succeeded”, “failed”, “timed_out”, “aborted”, “error”)
startUrls	string[]	Starting URLs
includeLinks	boolean	Whether links are included
maxDepth	number	Maximum crawl depth
maxResults	number	Maximum number of results
teamId	string	Team identifier
projectId	string	Project identifier
createdAt	string	Creation timestamp
completedAt	string \| null	Completion timestamp
durationInSeconds	number	Total duration
totalPages	number	Count of pages extracted
useSitemap	boolean	Whether sitemap was used
extraction	string	Instructions defining how the AI should extract specific content from the crawl results
entireWebsite	boolean	Whether to crawl the entire website
excludeNonMainTags	boolean	Whether non-main tags are excluded from the crawl results’ markdown
deduplicateContent	boolean	Whether to remove duplicate text fragments that appeared on other pages.
timeout	number	The timeout of the crawl in minutes

CrawlResult

Each crawl result includes:

Property	Type	Description
id	string	Unique identifier for the page result
crawlId	string	Unique identifier for the crawl
projectId	string	Project identifier
url	string	The crawled URL
title	string	Page title extracted from the HTML
markdown	string	Extracted content in markdown format
processedByAI	boolean	Whether the crawl result was processed by the AI
depthOfUrl	number	How deep this URL was in the crawl (0 = start URL)
isSuccess	boolean	Whether the crawl was successful
error	string	Error message if the crawl failed
createdAt	string	ISO timestamp when the result was created

CrawlLog

Each crawl log includes:

Property	Type	Description
id	string	Unique identifier for the log entry
message	string	Log message
level	’info’ \| ‘debug’ \| ‘warn’ \| ‘error’	Log level
description	string	Detailed description of the log entry
crawlId	string	Unique identifier for the crawl
teamId	string \| null	Team identifier
data	object \| null	Additional data associated with the log
createdAt	Date	Timestamp when the log was created

StartScrapePayload

Property	Type	Required	Default	Description
urls	string[]	✅	-	Array of URLs to scrape (max 30 URLs)
includeLinks	boolean	❌	true	Whether to include links in the scrape results’ markdown
excludeNonMainTags	boolean	❌	true	Whether to exclude non-main tags from the scrape results’ markdown
deduplicateContent	boolean	❌	true	Whether to remove duplicate text fragments that appeared in multiple scraped pages
extraction	string	❌	""	Instructions defining how the AI should extract specific content from the scrape results
timeout	number	❌	5	Timeout duration in minutes
engineType	string	❌	“auto”	The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites)
useStaticIps	boolean	❌	false	Whether to use static IPs for the scrape

ScrapeResponse

The response from a scrape operation:

Property	Type	Description
id	string	Unique scrape identifier
teamId	string	Team identifier
projectId	string	Project identifier
results	UrlResult[]	Array of scrape results
timeout	number	Timeout in minutes
origin	string	Origin of the scrape (“api” or “web”)
createdAt	Date	Creation timestamp
updatedAt	Date	Last update timestamp

UrlResult

Each URL scrape result includes:

Property	Type	Description
url	string	The URL that was scraped
isSuccess	boolean	Whether the scrape was successful
title	string	Page title
markdown	string	Extracted content in markdown format
links	string[]	Links found on the page
duplicatesRemovedCount	number	Number of duplicate content blocks removed
errorCode	number	Error code if the scrape failed
errorMessage	string	Error message if the scrape failed

Error Handling

The SDK throws errors for various scenarios. Always wrap your calls in try-catch blocks:

try {
  const crawl = await client.runCrawl({
    startUrl: 'https://example.com',
    maxResults: 10
  });
} catch (error) {
  console.error('Crawl failed:', error.message);
}

Overview

Quickstarts

Installation

Configuration

Usage

Starting a crawl

Get crawl

Query crawls

Get crawl results

Abort a crawl

Get crawl logs

Scrape multiple URLs

Objects

RunCrawlPayload

Crawl

CrawlResult

CrawlLog

StartScrapePayload

ScrapeResponse

UrlResult

Error Handling

Next Steps

API Reference

Python SDK

Overview

Quickstarts

​Installation

​Configuration

​Usage

​Starting a crawl

​Get crawl

​Query crawls

​Get crawl results

​Abort a crawl

​Get crawl logs

​Scrape multiple URLs

​Objects

​RunCrawlPayload

​Crawl

​CrawlResult

​CrawlLog

​StartScrapePayload

​ScrapeResponse

​UrlResult

​Error Handling

​Next Steps

API Reference

Python SDK

Installation

Configuration

Usage

Starting a crawl

Get crawl

Query crawls

Get crawl results

Abort a crawl

Get crawl logs

Scrape multiple URLs

Objects

RunCrawlPayload

Crawl

CrawlResult

CrawlLog

StartScrapePayload

ScrapeResponse

UrlResult

Error Handling

Next Steps