Skip to main content
GET
/
crawls
/
{id}
cURL
curl --request GET \
  --url https://api.olyptik.io/crawls/{id} \
  --header 'Authorization: Bearer <token>'
{
  "startUrl": "https://example.com",
  "maxResults": 55,
  "maxDepth": 10,
  "useSitemap": false,
  "entireWebsite": false,
  "excludeNonMainTags": true,
  "includeLinks": true,
  "deduplicateContent": true,
  "extraction": "Extract only pricing info",
  "engineType": "auto",
  "useStaticIps": false,
  "timeout": 1800,
  "id": "6870e36787c81925622df818",
  "createdAt": "2023-11-07T05:31:56Z",
  "status": "timed_out",
  "completedAt": "2023-11-07T05:31:56Z",
  "durationInSeconds": 1800,
  "brandId": "<string>",
  "startUrls": [
    "https://example.com"
  ],
  "totalPages": 100,
  "origin": "web"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

id
string
required

ID of crawl to return

Response

200 - application/json

Crawl object

startUrl
string<uri>
required

URL to start crawling from

Example:

"https://example.com"

maxResults
integer

Maximum number of results to collect Maximum number of results to collect

Required range: 1 <= x <= 110
maxDepth
integer
default:10

Maximum depth of pages to crawl Maximum depth of pages to crawl

Required range: 1 <= x <= 99
Example:

10

useSitemap
boolean
default:false

Whether to use sitemap.xml to crawl the website. If true - maxResults and maxDepth will be ignored.

Example:

false

entireWebsite
boolean
default:false

Whether to crawl the entire website. If true - maxResults and maxDepth will be ignored.

Example:

false

excludeNonMainTags
boolean
default:true

Whether to exclude non-main tags from the crawl results' markdown

Example:

true

Whether to include links in the crawl results' markdown

Example:

true

deduplicateContent
boolean
default:true

Whether to remove duplicate text fragments that appeared on other pages.

Example:

true

extraction
string
default:""

Instructions defining how the AI should extract specific content from the crawl results Instructions defining how the AI should extract specific content from the crawl results

Example:

"Extract only pricing info"

engineType
enum<string>
default:auto

The engine to use for the crawl. Auto: auto detect the best engine (default). Cheerio: fast, great for static websites. Playwright: great for dynamic websites that use JavaScript frameworks.

Available options:
auto,
cheerio,
playwright
Example:

"auto"

useStaticIps
boolean
default:false

Whether to use static IPs for the crawl. This target website can whitelist the IPs to use for the crawl. The static IP will be 154.17.150.0 Whether to use static IPs for the crawl. This target website can whitelist the IPs to use for the crawl. The static IP will be 154.17.150.0

Example:

false

timeout
integer
default:1800

Timeout duration in minutes Timeout duration in seconds

Required range: x >= 60
Example:

1800

id
string

Identification number of the crawl

Example:

"6870e36787c81925622df818"

createdAt
string<date-time>

Timestamp when the crawl was created

status
enum<string>

Current status of the crawl

Available options:
running,
succeeded,
failed,
aborted,
timed_out,
error
Example:

"timed_out"

completedAt
string<date-time>

Timestamp when the crawl was completed

durationInSeconds
integer

Duration of the crawl in seconds

Required range: x >= 0
Example:

1800

brandId
string

ID of the brand associated with the crawl

startUrls
string<uri>[]

Array of URLs to start crawling from

Example:
["https://example.com"]
totalPages
integer

Count of pages extracted

Required range: x >= 0
Example:

100

origin
enum<string>

Origin of the crawl request

Available options:
api,
web
Example:

"web"