Skip to main content
POST
/
scrape
cURL
curl --request POST \
  --url https://api.olyptik.io/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2"
  ],
  "includeLinks": true,
  "excludeNonMainTags": true,
  "timeout": 60,
  "engineType": "auto",
  "useStaticIps": false,
  "deduplicateContent": true,
  "extraction": "Extract pricing information"
}
'
{
  "id": "67890abc123def456789",
  "teamId": "team_123",
  "urls": [
    "https://example.com/page1",
    "https://example.com/page2"
  ],
  "results": [
    {
      "url": "https://example.com/page1",
      "isSuccess": true,
      "title": "Page Title",
      "markdown": "# Page Title\n\nPage content here...",
      "links": [
        "<string>"
      ],
      "duplicatesRemovedCount": 0,
      "errorCode": null,
      "errorMessage": null
    }
  ],
  "timeout": 60,
  "origin": "api",
  "projectId": "project_123",
  "createdAt": "2025-01-15T10:30:00Z",
  "updatedAt": "2025-01-15T10:31:00Z"
}
The scrape endpoint allows you to scrape multiple URLs at once (up to 30 URLs). This is perfect for when you need to extract content from specific pages without crawling.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Scrape request payload

urls
string<uri>[]
required

Array of URLs to scrape (max 30)

Required array length: 1 - 30 elements
Example:
[
"https://example.com/page1",
"https://example.com/page2"
]

Whether to include links in the markdown output

excludeNonMainTags
boolean
default:true

Whether to exclude non-main tags from the markdown

timeout
integer
default:60

Timeout in seconds for the scrape operation

Required range: x >= 1
engineType
enum<string>
default:auto

The engine to use for scraping

Available options:
auto,
cheerio,
playwright
useStaticIps
boolean
default:false

Whether to use static IPs for scraping

deduplicateContent
boolean
default:true

Whether to remove duplicate content

extraction
string
default:""

AI instructions for extracting specific content

Example:

"Extract pricing information"

Response

Scrape response with results for all URLs

id
string

Unique identifier for the scrape operation

Example:

"67890abc123def456789"

teamId
string

ID of the team that initiated the scrape

Example:

"team_123"

urls
string<uri>[]

Array of URLs that were scraped

Example:
[
"https://example.com/page1",
"https://example.com/page2"
]
results
object[]

Results for each URL

timeout
integer

Timeout used for the scrape operation in seconds

Example:

60

origin
string

Origin of the scrape request

Example:

"api"

projectId
string

Project ID associated with the scrape

Example:

"project_123"

createdAt
string<date-time>

Timestamp when the scrape was created

Example:

"2025-01-15T10:30:00Z"

updatedAt
string<date-time>

Timestamp when the scrape was last updated

Example:

"2025-01-15T10:31:00Z"