Skip to main content
POST
/
crawls
cURL
curl --request POST \
  --url https://api.olyptik.io/crawls \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "startUrl": "https://example.com",
  "maxResults": 5000,
  "maxDepth": 10,
  "useSitemap": false,
  "entireWebsite": false,
  "excludeNonMainTags": true,
  "includeLinks": true,
  "deduplicateContent": true,
  "extraction": "Extract only pricing info about the product",
  "engineType": "auto",
  "useStaticIps": false,
  "timeout": 60
}'
{
  "startUrl": "https://example.com",
  "maxResults": 55,
  "maxDepth": 10,
  "useSitemap": false,
  "entireWebsite": false,
  "excludeNonMainTags": true,
  "includeLinks": true,
  "deduplicateContent": true,
  "extraction": "Extract only pricing info",
  "engineType": "auto",
  "useStaticIps": false,
  "timeout": 1800,
  "id": "6870e36787c81925622df818",
  "createdAt": "2023-11-07T05:31:56Z",
  "status": "timed_out",
  "completedAt": "2023-11-07T05:31:56Z",
  "durationInSeconds": 1800,
  "brandId": "<string>",
  "startUrls": [
    "https://example.com"
  ],
  "totalPages": 100,
  "origin": "web"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

maxResults and maxDepth will be ignored if useSitemap or entireWebsite is true

startUrl
string<uri>
required

URL to start crawling from

Example:

"https://example.com"

maxResults
integer

Maximum number of results to collect

Required range: 1 <= x <= 10000
maxDepth
integer
default:10

Maximum depth of pages to crawl

Required range: 1 <= x <= 100
useSitemap
boolean
default:false

Whether to use sitemap.xml to crawl the website. If true - maxResults and maxDepth will be ignored.

Example:

false

entireWebsite
boolean
default:false

Whether to crawl the entire website. If true - maxResults and maxDepth will be ignored.

Example:

false

excludeNonMainTags
boolean
default:true

Whether to exclude non-main tags from the crawl results' markdown

Example:

true

Whether to include links in the crawl results' markdown

Example:

true

deduplicateContent
boolean
default:true

Whether to remove duplicate text fragments that appeared on other pages.

Example:

true

extraction
string
default:""

Instructions defining how the AI should extract specific content from the crawl results

Example:

"Extract only pricing info about the product"

engineType
enum<string>
default:auto

The engine to use for the crawl. Auto: auto detect the best engine (default). Cheerio: fast, great for static websites. Playwright: great for dynamic websites that use JavaScript frameworks.

Available options:
auto,
cheerio,
playwright
Example:

"auto"

useStaticIps
boolean
default:false

Whether to use static IPs for the crawl. This target website can whitelist the IPs to use for the crawl. The static IP will be 154.17.150.0

Example:

false

timeout
integer

Timeout duration in minutes

Required range: x >= 60
Example:

60

Response

Crawl object

startUrl
string<uri>
required

URL to start crawling from

Example:

"https://example.com"

maxResults
integer

Maximum number of results to collect

Required range: 1 <= x <= 110
maxDepth
integer
default:10

Maximum depth of pages to crawl

Required range: 1 <= x <= 99
Example:

10

useSitemap
boolean
default:false

Whether to use sitemap.xml to crawl the website. If true - maxResults and maxDepth will be ignored.

Example:

false

entireWebsite
boolean
default:false

Whether to crawl the entire website. If true - maxResults and maxDepth will be ignored.

Example:

false

excludeNonMainTags
boolean
default:true

Whether to exclude non-main tags from the crawl results' markdown

Example:

true

Whether to include links in the crawl results' markdown

Example:

true

deduplicateContent
boolean
default:true

Whether to remove duplicate text fragments that appeared on other pages.

Example:

true

extraction
string
default:""

Instructions defining how the AI should extract specific content from the crawl results

Example:

"Extract only pricing info"

engineType
enum<string>
default:auto

The engine to use for the crawl. Auto: auto detect the best engine (default). Cheerio: fast, great for static websites. Playwright: great for dynamic websites that use JavaScript frameworks.

Available options:
auto,
cheerio,
playwright
Example:

"auto"

useStaticIps
boolean
default:false

Whether to use static IPs for the crawl. This target website can whitelist the IPs to use for the crawl. The static IP will be 154.17.150.0

Example:

false

timeout
integer
default:1800

Timeout duration in minutes Timeout duration in seconds

Required range: x >= 60
Example:

1800

id
string

Identification number of the crawl

Example:

"6870e36787c81925622df818"

createdAt
string<date-time>

Timestamp when the crawl was created

status
enum<string>

Current status of the crawl

Available options:
running,
succeeded,
failed,
aborted,
timed_out,
error
Example:

"timed_out"

completedAt
string<date-time>

Timestamp when the crawl was completed

durationInSeconds
integer

Duration of the crawl in seconds

Required range: x >= 0
Example:

1800

brandId
string

ID of the brand associated with the crawl

startUrls
string<uri>[]

Array of URLs to start crawling from

Example:
["https://example.com"]
totalPages
integer

Count of pages extracted

Required range: x >= 0
Example:

100

origin
enum<string>

Origin of the crawl request

Available options:
api,
web
Example:

"web"

I