Skip to main content

Installation

Install the SDK using pip:
pip install olyptik

Configuration

First, you’ll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.
from olyptik import Olyptik

# Initialize with API key
client = Olyptik(api_key="your_api_key")

Synchronous Usage

Start a crawl

crawl = client.run_crawl({
    "startUrl": "https://example.com",
    "maxResults": 50
})

print(f"Crawl started with ID: {crawl.id}")
print(f"Status: {crawl.status}")

Get crawl results

results = client.get_crawl_results(crawl.id)
for result in results.results:
    print(f"URL: {result.url}")
    print(f"Title: {result.title}")
    print(f"Depth: {result.depthOfUrl}")

Abort a crawl

aborted_crawl = client.abort_crawl(crawl.id)
print(f"Crawl aborted with ID: {aborted_crawl.id}")

Get crawl logs

Retrieve logs for a specific crawl to monitor its progress and debug issues:
page = 1
limit = 1200
logs = client.get_crawl_logs(crawl.id, page, limit)
for log in logs.results:
    print(f"[{log.level}] {log.message}: {log.description}")

Asynchronous Usage

For better performance with I/O operations, use the async client:

Start a crawl

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })

        print(f"Crawl started with ID: {crawl.id}")
        print(f"Status: {crawl.status}")

asyncio.run(main())

Get crawl results

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Get crawl results
        results = await client.get_crawl_results(crawl.id)
        for result in results.results:
            print(f"URL: {result.url}")
            print(f"Title: {result.title}")
            print(f"Depth: {result.depthOfUrl}")

asyncio.run(main())

Abort a crawl

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Abort the crawl
        aborted_crawl = await client.abort_crawl(crawl.id)
        print(f"Crawl aborted with ID: {aborted_crawl.id}")

asyncio.run(main())

Get crawl logs

import asyncio
from olyptik import AsyncOlyptik

async def main():
    async with AsyncOlyptik(api_key="your_api_key_here") as client:
        # First start a crawl
        crawl = await client.run_crawl({
            "startUrl": "https://example.com",
            "maxResults": 50
        })
        
        # Get crawl logs
        page = 1
        limit = 1200
        logs = await client.get_crawl_logs(crawl.id, page, limit)
        for log in logs.results:
            print(f"[{log.level}] {log.message}: {log.description}")

asyncio.run(main())

Configuration Options

StartCrawlPayload

The crawl configuration options available: The run crawl payload:
PropertyTypeRequiredDefaultDescription
startUrlstring-The URL to start crawling from
maxResultsnumber❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)-Maximum number of results to collect (1-10,000)
useSitemapboolean❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)falseWhether to use sitemap.xml to crawl the website
entireWebsiteboolean❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided)falseWhether to crawl the entire website
maxDepthnumber10Maximum depth of pages to crawl (1-100)
includeLinksbooleantrueWhether to include links in the crawl results’ markdown
excludeNonMainTagsbooleantrueWhether to exclude non-main tags from the crawl results’ markdown
deduplicateContentbooleantrueWhether to remove duplicate text fragments that appeared on other pages
extractionstring""Instructions defining how the AI should extract specific content from the crawl results
timeoutnumber60Timeout duration in minutes
engineTypestring“auto”The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites)
useStaticIpsbooleanfalseWhether to use static IPs for the crawl

Engine Types

Choose the appropriate engine for your crawling needs:
from olyptik import EngineType

# Available engine types
EngineType.AUTO        # Automatically choose the best engine
EngineType.PLAYWRIGHT  # Use Playwright for JavaScript-heavy sites
EngineType.CHEERIO     # Use Cheerio for faster, static content crawling

Crawl Status

Monitor your crawl status using the CrawlStatus enum:
from olyptik import CrawlStatus

# Possible status values
CrawlStatus.RUNNING    # Crawl is currently running
CrawlStatus.SUCCEEDED  # Crawl completed successfully
CrawlStatus.FAILED     # Crawl failed due to an error
CrawlStatus.TIMED_OUT  # Crawl exceeded timeout limit
CrawlStatus.ABORTED    # Crawl was manually aborted
CrawlStatus.ERROR      # Crawl encountered an error

Error Handling

The SDK provides comprehensive error handling:
from olyptik import Olyptik, OlyptikError, ApiError

client = Olyptik(api_key="your_api_key_here")

try:
    crawl = client.run_crawl({
        "startUrl": "https://example.com",
        "maxResults": 10
    })
except ApiError as e:
    print(f"API Error: {e.message}")
    print(f"Status Code: {e.status_code}")
except OlyptikError as e:
    print(f"SDK Error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Data Models

CrawlResult

Each crawl result contains:
@dataclass
class CrawlResult:
    crawlId: str          # Unique identifier for the crawl
    teamId: str           # Team identifier
	projectId: str        # Project identifier
    url: str              # The crawled URL
    title: str            # Page title
	processedByAI: bool   # Whether the crawl result was processed by the AI.
    markdown: str         # Extracted content in markdown format
    depthOfUrl: int       # How deep this URL was in the crawl
    createdAt: str        # When the result was created

Crawl

Crawl metadata includes:
@dataclass
class Crawl:
    id: str                    # Unique crawl identifier
    status: CrawlStatus        # Current status
    startUrls: List[str]       # Starting URLs
    includeLinks: bool         # Whether links are included
    maxDepth: int              # Maximum crawl depth
    maxResults: int            # Maximum number of results
    teamId: str                # Team identifier
	projectId: str        	   # Project identifier
    createdAt: str             # Creation timestamp
    completedAt: Optional[str] # Completion timestamp
    durationInSeconds: int     # Total duration
	extraction: str			   # Instructions defining how the AI should extract specific content from the crawl results
    totalPages: int            # Count of pages extracted
    useSitemap: bool           # Whether sitemap was used
    entireWebsite: bool        # Whether to use both sitemap and all found links
    excludeNonMainTags: bool   # Whether non-main tags are excluded from the crawl results' markdown
    timeout: int               # Timeout (minues)

CrawlLog

Each crawl log entry contains:
@dataclass
class CrawlLog:
    id: str                        # Unique log identifier
    message: str                   # Log message
    level: CrawlLogLevel           # Log level (info, debug, warn, error)
    description: str               # Detailed description
    crawlId: str                   # Crawl identifier
    teamId: Optional[str]          # Team identifier
    data: Optional[Dict[str, Any]] # Additional log data
    createdAt: Optional[str]       # Creation timestamp

Support

I