curl --request POST \
--url https://api.olyptik.io/crawls/query \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"page": 0,
"status": [
"running"
],
"startUrls": [
"<string>"
]
}
'{
"page": 0,
"results": [
{
"startUrl": "https://example.com",
"maxResults": 55,
"maxDepth": 10,
"useSitemap": false,
"entireWebsite": false,
"excludeNonMainTags": true,
"includeLinks": true,
"deduplicateContent": true,
"extraction": "Extract only pricing info",
"engineType": "auto",
"useStaticIps": false,
"timeout": 1800,
"id": "6870e36787c81925622df818",
"createdAt": "2023-11-07T05:31:56Z",
"status": "timed_out",
"completedAt": "2023-11-07T05:31:56Z",
"durationInSeconds": 1800,
"brandId": "<string>",
"startUrls": [
"https://example.com"
],
"totalPages": 100,
"origin": "web"
}
],
"totalPages": 0,
"totalResults": 0,
"limit": 20
}Query crawls
curl --request POST \
--url https://api.olyptik.io/crawls/query \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"page": 0,
"status": [
"running"
],
"startUrls": [
"<string>"
]
}
'{
"page": 0,
"results": [
{
"startUrl": "https://example.com",
"maxResults": 55,
"maxDepth": 10,
"useSitemap": false,
"entireWebsite": false,
"excludeNonMainTags": true,
"includeLinks": true,
"deduplicateContent": true,
"extraction": "Extract only pricing info",
"engineType": "auto",
"useStaticIps": false,
"timeout": 1800,
"id": "6870e36787c81925622df818",
"createdAt": "2023-11-07T05:31:56Z",
"status": "timed_out",
"completedAt": "2023-11-07T05:31:56Z",
"durationInSeconds": 1800,
"brandId": "<string>",
"startUrls": [
"https://example.com"
],
"totalPages": 100,
"origin": "web"
}
],
"totalPages": 0,
"totalResults": 0,
"limit": 20
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Query crawls
Page number to query
x >= 0Status of the crawls to query, if left empty all statuses will be queried
running, succeeded, failed, aborted, timed_out, error Start URLs of the crawls to query, if left empty all start URLs will be queried
Paginated results with matching crawls
x >= 0Show child attributes
URL to start crawling from
"https://example.com"
Maximum number of results to collect Maximum number of results to collect
1 <= x <= 110Maximum depth of pages to crawl Maximum depth of pages to crawl
1 <= x <= 9910
Whether to use sitemap.xml to crawl the website. If true - maxResults and maxDepth will be ignored.
false
Whether to crawl the entire website. If true - maxResults and maxDepth will be ignored.
false
Whether to exclude non-main tags from the crawl results' markdown
true
Whether to include links in the crawl results' markdown
true
Whether to remove duplicate text fragments that appeared on other pages.
true
Instructions defining how the AI should extract specific content from the crawl results Instructions defining how the AI should extract specific content from the crawl results
"Extract only pricing info"
The engine to use for the crawl. Auto: auto detect the best engine (default). Cheerio: fast, great for static websites. Playwright: great for dynamic websites that use JavaScript frameworks.
auto, cheerio, playwright "auto"
Whether to use static IPs for the crawl. This target website can whitelist the IPs to use for the crawl. The static IP will be 154.17.150.0 Whether to use static IPs for the crawl. This target website can whitelist the IPs to use for the crawl. The static IP will be 154.17.150.0
false
Timeout duration in minutes Timeout duration in seconds
x >= 601800
Identification number of the crawl
"6870e36787c81925622df818"
Timestamp when the crawl was created
Current status of the crawl
running, succeeded, failed, aborted, timed_out, error "timed_out"
Timestamp when the crawl was completed
Duration of the crawl in seconds
x >= 01800
ID of the brand associated with the crawl
Array of URLs to start crawling from
["https://example.com"]Count of pages extracted
x >= 0100
Origin of the crawl request
api, web "web"
x >= 0x >= 0x >= 0Was this page helpful?