Skip to main content

Introduction

Olyptik is a powerful web scraping and data extraction platform that helps you gather and process web data efficiently. This guide will walk you through the essential steps to get started with our API.

Getting Started with the API

Authentication

First, you’ll need an API key. You can get one from your settings page. Include it in your requests using the Authorization header:
Authorization: Bearer your-api-key

Step 1: Create a Crawl

Start by creating a crawl with your target URL. This initiates the web scraping process:
// npm i olyptik
import Olyptik from "olyptik";

const olyptik = new Olyptik({ apiKey: 'your-api-key' });

const crawl = await olyptik.runCrawl({
  startUrl: 'https://example.com',
  maxResults: 50
});
  
console.log('Crawl created:', crawl.id);
The response will include a crawl object that you’ll use to track and retrieve results:
Crawl object
{
  "id": "6870e36787c81925622df818",
  "status": "running",
  "startUrls": ["https://example.com"],
  "maxResults": 50,
  "maxDepth": 5,
  "totalPages": 0,
  "includeLinks": true,
  "entireWebsite": false,
  "excludeNonMainTags": true,
  "deduplicateContent": true,
  "useStaticIps": false,
  "origin": "api",
  "teamId": "5870e36787c81925622df543",
  "projectId": "7895e36754c85425622dk946",
  "timeout": 60,
  "extraction" :"",
  "durationInSeconds": 0,
  "completedAt": null,
  "createdAt": "2024-03-21T10:00:00Z",
  "updatedAt": "2024-03-21T10:00:00Z"
}

Step 2: Monitor Crawl Status

You have two options to track your crawl’s progress: Configure a webhook URL in your settings page to receive real-time status updates. You can choose from two notification types:
  • Crawl Status Changes: Receive notifications when crawl status changes (succeeded, failed, timed out, etc.)
  • Individual Results: Receive notifications for each page crawled during the process
You can enable either option or both simultaneously.
{
  "secret": "u9dv22c5",
  "eventType": "crawl_status_change",
  "data": {
    "crawl": {
      "id": "6870336416bf934c31bffdd8",
      "status": "timed_out",
      "durationInSeconds": 45001,
      "teamId": "680e8bc08cdb14f10180db6a",
	  "projectId": "6450e36787c81925622df494",
      "startUrls": ["https://example.com"],
      "totalPages": 19,
      "entireWebsite": false,
      "useSitemap": false,
      "excludeNonMainTags": true,
	  "deduplicateContent": true,
      "extraction" :"",
      "useStaticIps": false,
      "maxDepth": 5,
      "maxResults": 20,
      "includeLinks": false,
      "origin": "api",
      "timeout": 60,
      "createdAt": "2025-07-10T21:40:52.580Z",
      "updatedAt": "2025-07-11T10:10:54.251Z",
      "completedAt": "2025-07-11T10:10:54.250Z"
    }
  }
}

Option 2: Polling

If you haven’t configured webhook notifications, you can poll the crawl status:
const crawl = await olyptik.getCrawl(crawlId);

Step 3: Get Crawl Results

Once the crawl is complete, retrieve the results:
const page = 1;
const itemsInPage = 50;
const results = await olyptik.getCrawlResults(crawlId, page, itemsInPage);
console.log(results);
Example response:
{
  "page": 1,
  "limit": 10,
  "totalPages": 2,
  "totalResults": 19,
  "results": [
    {
      "id": "6870e3c687c81925622df89e",
      "crawlId": "6870e36787c81925622df818",
      "teamId": "680e8bc08cdb14f10180db6a",
	  "projectId": "6450e36787c81925622df494",
      "url": "https://example.com/solutions/network-observability",
      "processedByAI": false,
      "depthOfUrl": 4,
      "title": "Kubernetes Network Observability",
      "markdown": "## Awesome website content...",
      "createdAt": "2025-07-11T10:13:26.966Z"
    },
		...
  ]
}

Next Steps

  • Check out our API Reference for detailed endpoint documentation

Need Help?

I