Installation
Install the SDK using npm:Configuration
First, you’ll need to initialize the SDK with your API key - you can get it from the settings page. You can either pass it directly or use environment variables.Usage
Starting a crawl
The SDK allows you to start web crawls with various configuration options:Get crawl
Retrieve a crawl - the response will be a crawl object.Query crawls
Get crawl results
Retrieve the results of your crawl using the crawl ID:Abort a crawl
Get crawl logs
Retrieve logs for a specific crawl to monitor its progress and debug issues:Objects
RunCrawlPayload
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
| startUrl | string | ✅ | - | The URL to start crawling from |
| maxResults | number | ❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided) | - | Maximum number of results to collect (1-10,000) |
| useSitemap | boolean | ❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided) | false | Whether to use sitemap.xml to crawl the website |
| entireWebsite | boolean | ❌ (conditional - either “maxResults”/“useSitemap” or “entireWebsite” must be provided) | false | Whether to crawl the entire website |
| maxDepth | number | ❌ | 10 | Maximum depth of pages to crawl (1-100) |
| includeLinks | boolean | ❌ | true | Whether to include links in the crawl results’ markdown |
| excludeNonMainTags | boolean | ❌ | true | Whether to exclude non-main tags from the crawl results’ markdown |
| deduplicateContent | boolean | ❌ | true | Whether to remove duplicate text fragments that appeared on other pages |
| extraction | string | ❌ | "" | Instructions defining how the AI should extract specific content from the crawl results |
| timeout | number | ❌ | 60 | Timeout duration in minutes |
| engineType | string | ❌ | “auto” | The engine to use: “auto”, “cheerio” (fast, static sites), “playwright” (dynamic sites) |
| useStaticIps | boolean | ❌ | false | Whether to use static IPs for the crawl |
Crawl
| Property | Type | Description |
|---|---|---|
| id | string | Unique crawl identifier |
| status | string | Current status (“running”, “succeeded”, “failed”, “timed_out”, “aborted”, “error”) |
| startUrls | string[] | Starting URLs |
| includeLinks | boolean | Whether links are included |
| maxDepth | number | Maximum crawl depth |
| maxResults | number | Maximum number of results |
| teamId | string | Team identifier |
| projectId | string | Project identifier |
| createdAt | string | Creation timestamp |
| completedAt | string | null | Completion timestamp |
| durationInSeconds | number | Total duration |
| totalPages | number | Count of pages extracted |
| useSitemap | boolean | Whether sitemap was used |
| extraction | string | Instructions defining how the AI should extract specific content from the crawl results |
| entireWebsite | boolean | Whether to crawl the entire website |
| excludeNonMainTags | boolean | Whether non-main tags are excluded from the crawl results’ markdown |
| deduplicateContent | boolean | Whether to remove duplicate text fragments that appeared on other pages. |
| timeout | number | The timeout of the crawl in minutes |
CrawlResult
Each crawl result includes:| Property | Type | Description |
|---|---|---|
| id | string | Unique identifier for the page result |
| crawlId | string | Unique identifier for the crawl |
| projectId | string | Project identifier |
| url | string | The crawled URL |
| title | string | Page title extracted from the HTML |
| markdown | string | Extracted content in markdown format |
| processedByAI | boolean | Whether the crawl result was processed by the AI |
| depthOfUrl | number | How deep this URL was in the crawl (0 = start URL) |
| isSuccess | boolean | Whether the crawl was successful |
| error | string | Error message if the crawl failed |
| createdAt | string | ISO timestamp when the result was created |
CrawlLog
Each crawl log includes:| Property | Type | Description |
|---|---|---|
| id | string | Unique identifier for the log entry |
| message | string | Log message |
| level | ’info’ | ‘debug’ | ‘warn’ | ‘error’ | Log level |
| description | string | Detailed description of the log entry |
| crawlId | string | Unique identifier for the crawl |
| teamId | string | null | Team identifier |
| data | object | null | Additional data associated with the log |
| createdAt | Date | Timestamp when the log was created |