Installation
Install the SDK using pip:Configuration
First, you’ll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.Synchronous Usage
Start a crawl
Get crawl results
Abort a crawl
Get crawl logs
Retrieve logs for a specific crawl to monitor its progress and debug issues:Asynchronous Usage
For better performance with I/O operations, use the async client:Start a crawl
Get crawl results
Abort a crawl
Get crawl logs
Configuration Options
StartCrawlPayload
The crawl configuration options available: The run crawl payload:| Property | Type | Required | Default | Description |
|---|---|---|---|---|
| startUrl | string | ✅ | - | The URL to start crawling from |
| maxResults | number | ❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided) | - | Maximum number of results to collect (1-10,000) |
| useSitemap | boolean | ❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided) | false | Whether to use sitemap.xml to crawl the website |
| entireWebsite | boolean | ❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided) | false | Whether to crawl the entire website |
| maxDepth | number | ❌ | 10 | Maximum depth of pages to crawl (1-100) |
| includeLinks | boolean | ❌ | true | Whether to include links in the crawl results’ markdown |
| excludeNonMainTags | boolean | ❌ | true | Whether to exclude non-main tags from the crawl results’ markdown |
| deduplicateContent | boolean | ❌ | true | Whether to remove duplicate text fragments that appeared on other pages |
| extraction | string | ❌ | "" | Instructions defining how the AI should extract specific content from the crawl results |
| timeout | number | ❌ | 60 | Timeout duration in minutes |
| engineType | string | ❌ | “auto” | The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites) |
| useStaticIps | boolean | ❌ | false | Whether to use static IPs for the crawl |
Engine Types
Choose the appropriate engine for your crawling needs:Crawl Status
Monitor your crawl status using theCrawlStatus enum:
Error Handling
The SDK provides comprehensive error handling:Data Models
CrawlResult
Each crawl result contains:Crawl
Crawl metadata includes:CrawlLog
Each crawl log entry contains:Support
- 📧 Email: support@olyptik.io
- 📚 API Reference: API Documentation