Python SDK

Installation

Install the SDK using pip:

pip install olyptik

Configuration

First, you’ll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.

from olyptik import Olyptik

# Initialize with API key
client = Olyptik(api_key="your_api_key")

Synchronous Usage

Start a crawl

crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "maxResults": 50
})

print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")

Get crawl results

results = client.get_crawl_results(crawl.id)
for result in results.results:
    print(f"URL: {result.url}")
    print(f"Title: {result.title}")
    print(f"Depth: {result.depthOfUrl}")

Abort a crawl

aborted_crawl = client.abort_crawl(crawl.id)
print(f"Crawl aborted with ID: {aborted_crawl.id}")

Get crawl logs

Retrieve logs for a specific crawl to monitor its progress and debug issues:

page = 1
limit = 1200
logs = client.get_crawl_logs(crawl.id, page, limit)
for log in logs.results:
    print(f"[{log.level}] {log.message}: {log.description}")

Scrape multiple URLs

Scrape up to 30 URLs at once without following links:

scrape_response = client.scrape({
    "urls": ["https://example.com", "https://example.com/about"]
})

for result in scrape_response.results:
    if result.isSuccess:
        print(f"URL: {result.url}")
        print(f"Title: {result.title}")
        print(f"Content: {result.markdown[:100]}...")

Asynchronous Usage

For better performance with I/O operations, use the async client:

Start a crawl

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })

        print(f"Crawl started with ID: {crawl.id}")
        print(f"Status: {crawl.status}")

asyncio.run(main())

Get crawl results

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Get crawl results
        results = await client.get_crawl_results(crawl.id)
        for result in results.results:
            print(f"URL: {result.url}")
            print(f"Title: {result.title}")
            print(f"Depth: {result.depthOfUrl}")

asyncio.run(main())

Abort a crawl

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Abort the crawl
        aborted_crawl = await client.abort_crawl(crawl.id)
        print(f"Crawl aborted with ID: {aborted_crawl.id}")

asyncio.run(main())

Get crawl logs

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Get crawl logs
        page = 1
        limit = 1200
        logs = await client.get_crawl_logs(crawl.id, page, limit)
        for log in logs.results:
            print(f"[{log.level}] {log.message}: {log.description}")

asyncio.run(main())

Scrape multiple URLs

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        scrape_response = await client.scrape({
            "urls": ["https://example.com", "https://example.com/about"]
        })
        
        for result in scrape_response.results:
            if result.isSuccess:
                print(f"URL: {result.url}")
                print(f"Title: {result.title}")
                print(f"Content: {result.markdown[:100]}...")

asyncio.run(main())

Configuration Options

StartCrawlPayload

The crawl configuration options available: The run crawl payload:

Property	Type	Required	Default	Description
startUrl	string	✅	-	The URL to start crawling from
maxResults	number	❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)	-	Maximum number of results to collect (1-10,000)
useSitemap	boolean	❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)	false	Whether to use sitemap.xml to crawl the website
entireWebsite	boolean	❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)	false	Whether to crawl the entire website
maxDepth	number	❌	10	Maximum depth of pages to crawl (1-100)
includeLinks	boolean	❌	true	Whether to include links in the crawl results’ markdown
excludeNonMainTags	boolean	❌	true	Whether to exclude non-main tags from the crawl results’ markdown
deduplicateContent	boolean	❌	true	Whether to remove duplicate text fragments that appeared on other pages
extraction	string	❌	""	Instructions defining how the AI should extract specific content from the crawl results
timeout	number	❌	60	Timeout duration in minutes
engineType	string	❌	“auto”	The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites)
useStaticIps	boolean	❌	false	Whether to use static IPs for the crawl

StartScrapePayload

The scrape configuration options available: || Property | Type | Required | Default | Description | || ------------------ | ------------ | -------- | ------- | ---------------------------------------------------------------------------------------- | || urls | string[] | ✅ | - | Array of URLs to scrape (max 30 URLs) | || includeLinks | boolean | ❌ | true | Whether to include links in the scrape results’ markdown | || excludeNonMainTags | boolean | ❌ | true | Whether to exclude non-main tags from the scrape results’ markdown | || deduplicateContent | boolean | ❌ | true | Whether to remove duplicate text fragments that appeared in multiple scraped pages | || extraction | string | ❌ | "" | Instructions defining how the AI should extract specific content from the scrape results| || timeout | number | ❌ | 5 | Timeout duration in minutes | || engineType | string | ❌ | “auto” | The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites)| || useStaticIps | boolean | ❌ | false | Whether to use static IPs for the scrape |

Engine Types

Choose the appropriate engine for your crawling needs:

from olyptik import EngineType

# Available engine types
EngineType.AUTO        # Automatically choose the best engine
EngineType.PLAYWRIGHT  # Use Playwright for JavaScript-heavy sites
EngineType.CHEERIO     # Use Cheerio for faster, static content crawling

Crawl Status

Monitor your crawl status using the CrawlStatus enum:

from olyptik import CrawlStatus

# Possible status values
CrawlStatus.RUNNING    # Crawl is currently running
CrawlStatus.SUCCEEDED  # Crawl completed successfully
CrawlStatus.FAILED     # Crawl failed due to an error
CrawlStatus.TIMED_OUT  # Crawl exceeded timeout limit
CrawlStatus.ABORTED    # Crawl was manually aborted
CrawlStatus.ERROR      # Crawl encountered an error

Error Handling

The SDK provides comprehensive error handling:

from olyptik import Olyptik, OlyptikError, ApiError

client = Olyptik(api_key="your_api_key_here")

try:
    crawl = client.run_crawl({
        "startUrl": "https://example.com",
        "maxResults": 10
    })
except ApiError as e:
    print(f"API Error: {e.message}")
    print(f"Status Code: {e.status_code}")
except OlyptikError as e:
    print(f"SDK Error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Data Models

CrawlResult

Each crawl result contains:

@dataclass
class CrawlResult:
    crawlId: str          # Unique identifier for the crawl
    teamId: str           # Team identifier
	projectId: str        # Project identifier
    url: str              # The crawled URL
    title: str            # Page title
	processedByAI: bool   # Whether the crawl result was processed by the AI.
    markdown: str         # Extracted content in markdown format
    depthOfUrl: int       # How deep this URL was in the crawl
    createdAt: str        # When the result was created

Crawl

Crawl metadata includes:

@dataclass
class Crawl:
    id: str                    # Unique crawl identifier
    status: CrawlStatus        # Current status
    startUrls: List[str]       # Starting URLs
    includeLinks: bool         # Whether links are included
    maxDepth: int              # Maximum crawl depth
    maxResults: int            # Maximum number of results
    teamId: str                # Team identifier
	projectId: str        	   # Project identifier
    createdAt: str             # Creation timestamp
    completedAt: Optional[str] # Completion timestamp
    durationInSeconds: int     # Total duration
	extraction: str			   # Instructions defining how the AI should extract specific content from the crawl results
    totalPages: int            # Count of pages extracted
    useSitemap: bool           # Whether sitemap was used
    entireWebsite: bool        # Whether to use both sitemap and all found links
    excludeNonMainTags: bool   # Whether non-main tags are excluded from the crawl results' markdown
    timeout: int               # Timeout (minues)

CrawlLog

Each crawl log entry contains:

@dataclass
class CrawlLog:
    id: str                        # Unique log identifier
    message: str                   # Log message
    level: CrawlLogLevel           # Log level (info, debug, warn, error)
    description: str               # Detailed description
    crawlId: str                   # Crawl identifier
    teamId: Optional[str]          # Team identifier
    data: Optional[Dict[str, Any]] # Additional log data
    createdAt: Optional[str]       # Creation timestamp

ScrapeResponse

The response from a scrape operation:

@dataclass
class ScrapeResponse:
    id: str                    # Unique scrape identifier
    teamId: str                # Team identifier
    projectId: str             # Project identifier
    results: List[UrlResult]   # Array of scrape results
    timeout: int               # Timeout in minutes
    origin: str                # Origin of the scrape ("api" or "web")
    createdAt: str             # Creation timestamp
    updatedAt: str             # Last update timestamp

UrlResult

Each URL scrape result contains:

@dataclass
class UrlResult:
    url: str                            # The URL that was scraped
    isSuccess: bool                     # Whether the scrape was successful
    title: str                          # Page title
    markdown: str                       # Extracted content in markdown format
    links: List[str]                    # Links found on the page
    duplicatesRemovedCount: Optional[int]  # Number of duplicate content blocks removed
    errorCode: Optional[int]            # Error code if the scrape failed
    errorMessage: Optional[str]         # Error message if the scrape failed

Support

📧 Email: [email protected]
📚 API Reference: API Documentation

Overview

Quickstarts

Installation

Configuration

Synchronous Usage

Start a crawl

Get crawl results

Abort a crawl

Get crawl logs

Scrape multiple URLs

Asynchronous Usage

Start a crawl

Get crawl results

Abort a crawl

Get crawl logs

Scrape multiple URLs

Configuration Options

StartCrawlPayload

StartScrapePayload

Engine Types

Crawl Status

Error Handling

Data Models

CrawlResult

Crawl

CrawlLog

ScrapeResponse

UrlResult

Support

Overview

Quickstarts

​Installation

​Configuration

​Synchronous Usage

​Start a crawl

​Get crawl results

​Abort a crawl

​Get crawl logs

​Scrape multiple URLs

​Asynchronous Usage

​Start a crawl

​Get crawl results

​Abort a crawl

​Get crawl logs

​Scrape multiple URLs

​Configuration Options

​StartCrawlPayload

​StartScrapePayload

​Engine Types

​Crawl Status

​Error Handling

​Data Models

​CrawlResult

​Crawl

​CrawlLog

​ScrapeResponse

​UrlResult

​Support

Installation

Configuration

Synchronous Usage

Start a crawl

Get crawl results

Abort a crawl

Get crawl logs

Scrape multiple URLs

Asynchronous Usage

Start a crawl

Get crawl results

Abort a crawl

Get crawl logs

Scrape multiple URLs

Configuration Options

StartCrawlPayload

StartScrapePayload

Engine Types

Crawl Status

Error Handling

Data Models

CrawlResult

Crawl

CrawlLog

ScrapeResponse

UrlResult

Support