# Chocodata - full documentation corpus > Single-file concatenation of every page at https://chocodata.com/docs. > For the manifest (URL list) only, see https://chocodata.com/docs/llms.txt. > For an individual page as Markdown, append `.md` to the URL. --- # Getting started _Source: https://chocodata.com/docs/getting-started_ # Getting started Chocodata is one scraping API for the whole web: a single REST request returns structured JSON from 235 sites and 453 endpoints. You will have a working scrape in under 2 minutes. ## 1. Create an API key 1. Sign in at [app.chocodata.com](https://app.chocodata.com) (Google sign-in or email, no card required for the free tier). 2. Open **API keys** in the left sidebar. 3. Click **Create key**, name it (for example `local-dev`), and copy the value. It is only shown once. Your key looks like `cd_live_xxxxxxxxxxxxxxxxxxxxxxxx`. Treat it like a password. The free tier gives you 1,000 requests a month to try everything below. ## 2. The one request shape Every endpoint follows the same pattern. You change the site and the resource; everything else stays the same: ```http GET https://api.chocodata.com/api/v1/{site}/{resource}?api_key=YOUR_KEY ``` `{site}` is the target (`walmart`, `ebay`, `google`, `indeed`, ...), `{resource}` is what you want from it (`product`, `search`, `job`, ...), and `api_key` is your key as a query parameter. That is the only authentication you need. ## 3. Your first request - cURL Let's pull a product from Walmart: ```bash curl "https://api.chocodata.com/api/v1/walmart/product?api_key=YOUR_KEY&query=5085206428" ``` You get back clean JSON: ```json { "id": "5085206428", "title": "LEGEND COOKWARE 5-Ply Stainless Steel Cookware Set", "price": 299.99, "currency": "USD", "rating": 4.7, "reviews_count": 2481, "images": ["https://cdn.example.com/images/cookware-set-71..."], "brand": "LEGEND COOKWARE", "category": [], "offers": [] } ``` The exact same pattern works for any of the 235 sites and 453 endpoints: ```bash # Keyword search on a search engine curl "https://api.chocodata.com/api/v1/google/search?api_key=YOUR_KEY&query=wireless+headphones" # A marketplace search curl "https://api.chocodata.com/api/v1/ebay/search?api_key=YOUR_KEY&query=mechanical+keyboard" # A single job posting curl "https://api.chocodata.com/api/v1/indeed/job?api_key=YOUR_KEY&query=https://www.indeed.com/viewjob?jk=abc123" ``` Average response time: **~2.6 s median**, **~6 s p95**. ## 4. The same call in Python ```python import requests r = requests.get( "https://api.chocodata.com/api/v1/walmart/product", params={"api_key": "cd_live_YOUR_KEY", "query": "5085206428"}, ) r.raise_for_status() product = r.json() print(product["title"], "-", product["price"], product["currency"]) ``` ## 5. The same call in Node.js ```javascript const res = await fetch( "https://api.chocodata.com/api/v1/walmart/product?query=5085206428&api_key=cd_live_YOUR_KEY" ); const product = await res.json(); console.log(product.title, "-", product.price, product.currency); ``` Prefer a typed client? We publish official [SDKs for Node, Python, Go, a CLI, and an MCP server](/docs/guides/sdks). ## What about sites without a dedicated endpoint? There are two surfaces: - **Dedicated endpoints** return clean, validated JSON for a supported site and page type (the call above). Browse all of them in the [scraper directory](/scraper-api). - **The [Universal Web Scraper API](/docs/endpoints/universal)** fetches any other URL and returns JSON, HTML, or text. ```bash # Scrape any URL the directory does not cover yet curl "https://api.chocodata.com/api/v1/universal/get?api_key=YOUR_KEY&url=https://example.com&parse=auto" ``` [Core concepts](/docs/core-concepts) explains how to choose between the two. ## What costs a credit? **Only successful (HTTP 2xx) responses.** Errors, target-site challenge pages, timeouts, and `404`s never touch your balance. One request costs 5 credits; plans are sized in requests. See [Billing](/docs/guides/billing) and [pricing](https://chocodata.com/pricing) for the full policy. ## Common errors | HTTP | Body contains | What it means | What to do | |---|---|---|---| | 400 | `invalid_params` | Query or `domain` malformed | Check your `query` value and that `domain` (if used) is supported for the target | | 401 | `unauthorized` | API key missing or wrong | Verify your `?api_key=` value | | 429 | `rate_limited` | Too many requests from your key | Back off and respect `Retry-After` | | 404 | `not_found` | The item is delisted, malformed, or does not exist on this site | No retry; drop the item or try a different region | | 502 | `target_unreachable` / `extraction_failed` | The target blocked us on all retries, or served a page we could not parse | Retry in ~30 s; usually transient | Full list with retry semantics: [Error codes](/docs/guides/errors). ## Where to go next - [Core concepts](/docs/core-concepts) - how the API works, dedicated vs Universal, credits. - [Endpoint reference](/docs/endpoint-reference) - resource types and how to find any endpoint. - [Product endpoint](/docs/endpoints/product) and [Search endpoint](/docs/endpoints/search) - the two canonical patterns. - [Scraper directory](/scraper-api) - browse all 453 endpoints. - [Country + content language](/docs/guides/country-and-language) - regional storefronts and languages. --- # Core concepts - how the API works _Source: https://chocodata.com/docs/core-concepts_ # Core concepts Chocodata is one web-scraping API for the whole web. Instead of learning a different API for every site, you learn a single request shape and point it at whichever site and resource you need. Today that covers 235 sites, 453 endpoints, and 17 categories, with 210 dedicated specific-item endpoints and 250+ endpoints returning validated, structured JSON. Anything without a dedicated endpoint is still scrapable through the Universal Web Scraper API. This page explains the model. Once you have it, every endpoint in the [scraper directory](/scraper-api) works the same way. ## The one request shape Every dedicated endpoint is a GET request in exactly this form: ```http GET https://api.chocodata.com/api/v1/{site}/{resource}?api_key=YOUR_KEY ``` - `{site}` is the target, for example `walmart`, `ebay`, `google`, `indeed`, `zillow`, `github`, `youtube`. - `{resource}` is what you want from that site, for example `product`, `search`, `job`, `property`, `profile`, `video`. - `api_key` is your key, passed as a query parameter. That is the only authentication mechanism. See [Authentication](/docs/guides/authentication). Most resources take a `query` parameter (an identifier, a URL, or a keyword) plus optional modifiers like `domain`, `language`, or `pages`. The same envelope holds across every site, so once you have called one endpoint you can call all of them. ```bash # A job listing on Indeed curl "https://api.chocodata.com/api/v1/indeed/job?api_key=YOUR_KEY&query=https://www.indeed.com/viewjob?jk=abc123" # A repository on GitHub curl "https://api.chocodata.com/api/v1/github/repository?api_key=YOUR_KEY&query=facebook/react" # A keyword search on Bing curl "https://api.chocodata.com/api/v1/bing/search?api_key=YOUR_KEY&query=best+running+shoes" ``` ## Two ways to scrape: dedicated endpoints vs Universal There are two surfaces, and you choose based on whether the site and page type have a dedicated parser. ### 1. Dedicated specific-item endpoints (structured JSON) A dedicated endpoint targets one site and one page type and returns clean, validated, structured JSON: named fields, correct types, no HTML to parse. Chocodata ships 210 dedicated specific-item endpoints (product, article, job, profile, listing, video, and more), and 250+ endpoints in total return validated JSON. Use a dedicated endpoint when one exists for your target. You get a typed object instead of raw markup, and the parser is maintained for you when the site changes its layout. ```http GET /api/v1/{site}/{resource}?api_key=YOUR_KEY&query=... ``` Browse every dedicated endpoint in the [scraper directory](/scraper-api): cards marked **JSON** return structured data. ### 2. The Universal Web Scraper API (any URL) For pages without a dedicated parser (or any arbitrary URL you want to fetch), use the Universal Web Scraper API. You hand it a URL, it handles proxies and anti-bot, and returns the page in the format you ask for: raw HTML, plain text, or auto-extracted JSON. ```http GET /api/v1/universal/get?api_key=YOUR_KEY&url=https://example.com&parse=auto ``` In the [scraper directory](/scraper-api), cards marked **via Universal** route through this endpoint. Full details on the [Universal Web Scraper API](/docs/endpoints/universal) page. | | Dedicated endpoint | Universal Web Scraper API | |---|---|---| | Path | `/api/v1/{site}/{resource}` | `/api/v1/universal/get` | | Input | `query` (id, URL, or keyword) | `url` (any web page) | | Output | Validated structured JSON | HTML, text, or auto-parsed JSON | | Best for | Supported site + page type | Any other URL, or full-page HTML | ## Resources, not sites The catalogue is organized by resource type, not by an exhaustive page-by-page list. A handful of resource types appear across hundreds of sites: | Resource | What it returns | Example sites | |---|---|---| | `search` | Ranked results for a keyword query | `google`, `bing`, `walmart`, `ebay`, `youtube` | | `product` | A single product page | `walmart`, `ebay`, `target`, `bestbuy`, `etsy` | | `article` | A single news or blog article | `bbc`, `reuters`, `techcrunch`, `medium` | | `job` | A single job posting | `indeed`, `linkedin`, `glassdoor` | | `property` | A real-estate listing | `zillow`, `redfin`, `realtor` | | `profile` | A public profile page | `github`, social sites, directories | | `quote` | A finance quote | `yahoo-finance`, `tradingview` | | `video` | A single video page | `youtube`, `vimeo` | Learn the request/response shape for one resource and it carries to every site that exposes it. The [Endpoint reference](/docs/endpoint-reference) explains this model and how to find the exact resource for any site. ## Credits and what a request costs Usage is metered in credits. The conversion is simple: | Action | Credits | |---|---| | One request (dedicated endpoint or Universal) | 5 | | JavaScript rendering (when shipped, opt-in add-on) | +10 | | Screenshot (when shipped, opt-in add-on) | +10 | Plans are sized in requests (1 request = 5 credits). The Free plan, for example, is 1,000 requests per month, which is 5,000 credits. Credits are the internal accounting unit; your plan and PAYG pricing are quoted in requests. See [Billing](/docs/guides/billing) for the full policy and [pricing](https://chocodata.com/pricing) for plan limits. > `render_js` and `screenshot` are reserved today and return `501 not_implemented`. They never cost anything until they ship. ## You only pay for successful responses **Only successful (HTTP 2xx) responses are billed.** Errors, anti-bot challenge pages, timeouts, and `404`s never touch your balance. If we could not return usable data, you do not pay for it. Our orchestrator retries internally and bills at most once per request, only on a final 2xx. The full breakdown is in the [Billing policy](/docs/guides/billing). ## The response envelope Every dedicated endpoint returns a flat JSON object of named fields for that resource. Successful responses also carry a small set of informational headers: - `Asa-Cost` - credits spent (`5` for a standard request, `0` for any non-2xx). - `Asa-Resolved-Url` - the final target URL after redirects. - `Asa-Source-Status` - the target site's own HTTP status. - `Asa-Attempts` - how many internal attempts it took to fetch the page. - `Asa-Extractor-Version` - the parser version, for example `walmart@1.0.0`. Errors share one body shape across every endpoint, with a machine-readable `error` code, a `request_id`, and a `docs_url`. See [Error codes](/docs/guides/errors). ## Where to go next - [Getting started](/docs/getting-started) - your first call in under 2 minutes. - [Endpoint reference](/docs/endpoint-reference) - the resource model and how to find any endpoint. - [Universal Web Scraper API](/docs/endpoints/universal) - scrape any URL to JSON, HTML, or text. - [Scraper directory](/scraper-api) - browse all 453 endpoints. - [Dashboard playground](https://app.chocodata.com) - run any endpoint live and copy the request. --- # Endpoint reference _Source: https://chocodata.com/docs/endpoint-reference_ # Endpoint reference Chocodata exposes 470 endpoints across 237 sites and 17 categories. They are not 470 different APIs. They are one API, parameterized by `{site}` and `{resource}`, so this page documents the shared shape once and points you to the browsable directory for the specifics of any single endpoint. If you have read [Core concepts](/docs/core-concepts), you already know the request shape. This page is the map: what the resource types are, what the shared request and response look like, and where to look up an exact endpoint. ## The shared request shape Every dedicated endpoint is the same GET: ```http GET https://api.chocodata.com/api/v1/{site}/{resource}?api_key=YOUR_KEY&query=... ``` | Part | Meaning | Examples | |---|---|---| | `{site}` | The target site | `walmart`, `ebay`, `indeed`, `zillow`, `github`, `youtube` | | `{resource}` | The page type on that site | `product`, `search`, `job`, `property`, `profile`, `video` | | `query` | The identifier, URL, or keyword | a product ID, a job URL, `"running shoes"` | Common optional parameters that many endpoints accept: | Param | Type | Description | |---|---|---| | `domain` | enum | Regional storefront / locale where the site has one (`com`, `co.uk`, `de`). Supported values vary by site. See [Country, region & language](/docs/guides/country-and-language). | | `language` | string | Content language as `xx_YY` (`en_US`, `de_DE`). | | `pages` | int | For search resources, number of consecutive result pages to fetch (each page counts as a request). | | `add_html` | boolean | Attach the raw page HTML under `html` in the response. | ## Resource types Rather than memorize 470 endpoints, learn the resource types. A small set of resources covers the vast majority of endpoints, and each behaves consistently wherever it appears. | Resource | Returns | Typical sites | |---|---|---| | `search` | Ranked results for a keyword query (web results or commerce cards, depending on the site) | search engines, marketplaces, job boards, app stores | | `product` | A single product / item page with price, variants, ratings, images | e-commerce sites | | `article` | A single news or blog article with body, author, dates | news and media sites | | `job` | A single job posting with title, company, location, description | job boards | | `property` | A single real-estate listing | real-estate portals | | `profile` | A public profile page | social, developer, and directory sites | | `quote` | A finance quote / instrument page | finance and crypto sites | | `video` | A single video page with metadata | video platforms | | `post` | A single social or forum post | social and community sites | | `package` | A single package / library page | developer registries | | `listing` | A single classifieds or directory listing | classifieds and local directories | Categories you will see in the directory include: AI, Automotive, B2B & Companies, Developer, E-commerce, Finance & Crypto, Food, Jobs, Knowledge & Academic, Local & Directories, News & Media, Real Estate, Reviews, Search Engines, Social, and Travel. ### The two flavors of `search` `search` is the most widely available resource, and its response adapts to the site: - On a **search engine** (`google`, `bing`, ...), each result carries web fields: `title`, `url`, `snippet`, `position`. - On a **marketplace or board** (`walmart`, `ebay`, `indeed`, ...), each card carries domain fields: price, rating, sponsored flags, or job metadata. Both share the same request shape. See the [Search endpoint](/docs/endpoints/search) for the full request and both response shapes. ## The shared response shape Dedicated endpoints return a flat JSON object of named fields for the requested resource. The exact field set depends on the resource (a `product` has `price` and `variations`; an `article` has `articleBody` and `author`), but the conventions are constant: - Field names are stable and typed. Numbers are numbers, not strings. - A field is `null` when the site did not publish it, not when extraction failed. You still get a valid object. - Search-style resources return an array (`products` or `results`) plus pagination metadata. Every successful response also carries informational headers (`Asa-Cost`, `Asa-Resolved-Url`, `Asa-Source-Status`, `Asa-Attempts`, `Asa-Extractor-Version`). Errors share one body shape with a machine-readable `error` code. See [Error codes](/docs/guides/errors). We document two resources in depth as the canonical pattern: - [Product endpoint](/docs/endpoints/product) - the specific-item pattern (one identifier in, one rich object out). - [Search endpoint](/docs/endpoints/search) - the list pattern (a keyword in, ranked results out). Read those two and you can read any endpoint, because every other dedicated endpoint is a variation on one of them. ## App Stores Dedicated endpoints for the Apple App Store (`appstore`) and Google Play (`googleplay`). Search a storefront, pull a single app's full listing, or page through user reviews. Every call is the same `GET https://api.chocodata.com/api/v1/{target}/{resource}?api_key=YOUR_KEY&...` shape. For a guided tour see [Scraping Social Media & App Stores](/docs/guides/social-and-app-stores). ### `appstore.search` Search the Apple App Store by keyword and get ranked app cards. ```http GET https://api.chocodata.com/api/v1/appstore/search?api_key=YOUR_KEY&term=whatsapp&country=us&limit=5 ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `term` | string | ✅ yes | - | Search keywords | | `country` | enum | - | `us` | Two-letter App Store storefront (`us`, `gb`, `de`, `jp`, ...) | | `limit` | int | - | `10` | Number of result cards to return | ```json { "query": "whatsapp", "page": 1, "total_results": 5, "results": [ { "position": 1, "id": "310633997", "track_id": "310633997", "bundle_id": "net.whatsapp.WhatsApp", "title": "WhatsApp Messenger", "seller_name": "WhatsApp Inc.", "primary_genre": "Social Networking", "genres": ["Social Networking", "Utilities"], "price": 0, "formatted_price": "Free", "currency": "USD", "rating": 4.68698, "reviews_count": 18082844, "content_advisory_rating": "12+", "version": "26.22.76", "release_date": "2009-05-04T02:43:49Z", "minimum_os_version": "15.1", "url": "https://apps.apple.com/us/app/whatsapp-messenger/id310633997" } /* 4 more cards */ ] } ``` ### `appstore.product` Full listing for a single Apple App Store app by numeric ID. ```http GET https://api.chocodata.com/api/v1/appstore/product?api_key=YOUR_KEY&id=310633997&country=us ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `id` | string | ✅ yes | - | The numeric App Store track ID | | `country` | enum | - | `us` | Two-letter App Store storefront | ```json { "id": "310633997", "track_id": "310633997", "bundle_id": "net.whatsapp.WhatsApp", "title": "WhatsApp Messenger", "seller_name": "WhatsApp Inc.", "primary_genre": "Social Networking", "genres": ["Social Networking", "Utilities"], "price": 0, "formatted_price": "Free", "currency": "USD", "rating": 4.68698, "rating_current_version": 4.68698, "reviews_count": 18082844, "content_advisory_rating": "12+", "version": "26.22.76", "current_version_release_date": "2026-06-07T16:47:53Z", "release_date": "2009-05-04T02:43:49Z", "release_notes": "We update the app regularly to fix bugs…", "file_size_bytes": 370200576, "minimum_os_version": "15.1", "description": "WhatsApp from Meta is a free messaging and calling app…", "artwork_url": "https://is1-ssl.mzstatic.com/image/thumb/…", "screenshot_urls": [], "supported_devices": ["iPhone5s-iPhone5s", "iPadAir-iPadAir"], "languages": ["AR", "BN", "EN"], "advisories": ["Infrequent/Mild Profanity or Crude Humor"], "kind": "software", "url": "https://apps.apple.com/us/app/whatsapp-messenger/id310633997?uo=4", "seller_url": "http://www.whatsapp.com/", "artist_view_url": "https://apps.apple.com/us/developer/whatsapp-inc/id310634000?uo=4" } ``` ### `appstore.reviews` A page of user reviews for a single Apple App Store app. ```http GET https://api.chocodata.com/api/v1/appstore/reviews?api_key=YOUR_KEY&id=310633997&country=us&page=1&sort=mostRecent ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `id` | string | ✅ yes | - | The numeric App Store track ID | | `country` | enum | - | `us` | Two-letter App Store storefront | | `page` | int | - | `1` | Review page to fetch | | `sort` | enum | - | `mostRecent` | `mostRecent` · `mostHelpful` | ```json { "id": "310633997", "country": "us", "page": 1, "sort": "mostRecent", "total_results": 50, "reviews": [ { "id": "14181597952", "author": "Arale soto aguilar", "author_url": "https://itunes.apple.com/us/reviews/id1642272657", "rating": 3, "title": "The new glitch", "body": "It is happening to me when I try to text the screen turns black…", "version": "26.22.76", "date": "2026-06-14T05:37:59-07:00", "vote_sum": 0, "vote_count": 0 } /* 49 more reviews */ ] } ``` ### `googleplay.product` Full listing for a single Google Play app by package ID. ```http GET https://api.chocodata.com/api/v1/googleplay/product?api_key=YOUR_KEY&id=com.whatsapp ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `id` | string | ✅ yes | - | The Android package ID (e.g. `com.whatsapp`) | | `country` | enum | - | `us` | Two-letter Play storefront | | `language` | string | - | region default | Content language as `xx_YY` | ```json { "id": "com.whatsapp", "package_id": "com.whatsapp", "title": "WhatsApp Messenger", "developer": "WhatsApp LLC", "developer_url": "http://www.whatsapp.com/", "description": "Simple. Reliable. Private.", "category": "COMMUNICATION", "operating_system": "ANDROID", "content_rating": "Everyone", "price": 0, "currency": "USD", "is_free": true, "rating": 4.661951065063477, "reviews_count": 236991524, "icon": "https://play-lh.googleusercontent.com/Gqxk4T0uZsDwFp07DE-508hkyvcNmgF…", "screenshots": [ "https://play-lh.googleusercontent.com/OBVqgRK7eerY0GPfK8AOzitu5oE9ecC6kG4kURTCb1K41gpqVsN0WjmJwJh-wX8…", "https://play-lh.googleusercontent.com/GQF4h3VL-kklOrVS_f1QRAJJZa2zQyVNcFbKdOIkvI_Pcu1op0Sy3uiry…" ], "url": "https://play.google.com/store/apps/details?id=com.whatsapp" } ``` ### `googleplay.search` Search Google Play by keyword and get ranked app cards. ```http GET https://api.chocodata.com/api/v1/googleplay/search?api_key=YOUR_KEY&q=whatsapp ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `q` | string | ✅ yes | - | Search keywords | | `country` | enum | - | `us` | Two-letter Play storefront | | `language` | string | - | region default | Content language as `xx_YY` | ```json { "query": "whatsapp", "page": 1, "total_results": 20, "results": [ { "position": 1, "id": "com.whatsapp", "package_id": "com.whatsapp", "title": "WhatsApp Messenger", "currency": "USD", "url": "https://play.google.com/store/apps/details?id=com.whatsapp" } /* 19 more cards */ ] } ``` ## Social Media Dedicated endpoints across Reddit, YouTube, TikTok, X (Twitter), Instagram, Facebook, and LinkedIn. Pull posts and their scores, video metadata, transcripts, comments, profiles, company pages, and job postings. Every call is the same `GET https://api.chocodata.com/api/v1/{target}/{resource}?api_key=YOUR_KEY&...` shape. See the [Scraping Social Media & App Stores guide](/docs/guides/social-and-app-stores) for an overview. ### `reddit.subreddit` A page of posts from a subreddit, in a chosen sort order. ```http GET https://api.chocodata.com/api/v1/reddit/subreddit?api_key=YOUR_KEY&subreddit=news&sort=hot&limit=5 ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `subreddit` | string | ✅ yes | - | Subreddit name without the `r/` prefix | | `sort` | enum | - | `hot` | `hot` · `new` · `top` · `rising` | | `t` | enum | - | site default | Time window for `top` (`hour` · `day` · `week` · `month` · `year` · `all`) | | `limit` | int | - | `25` | Number of posts to return | ```json { "subreddit": "news", "sort": "hot", "total_results": 5, "posts": [ { "id": "t3_1u6hmvz", "title": "Jeffco Public Schools says 61 boys found on girls' sports rosters were mascots, managers", "score": 13360, "num_comments": 352, "upvote_ratio": 0.9787, "awards": 0, "author": "HazyDavey68", "author_id": "t2_ayj7l5o9", "subreddit": "news", "permalink": "https://www.reddit.com/r/news/comments/1u6hmvz/…", "external_url": "https://www.denverpost.com/2026/06/13/…", "domain": "denverpost.com", "created": "2026-06-15T14:06:56.173000+0000" } /* 4 more posts */ ] } ``` ### `reddit.post` A single Reddit post plus its comment tree, with scores. ```http GET https://api.chocodata.com/api/v1/reddit/post?api_key=YOUR_KEY&subreddit=news&post_id=1u6hmvz ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `subreddit` | string | ✅ yes | - | Subreddit name without the `r/` prefix | | `post_id` | string | ✅ yes | - | The post's base-36 ID (with or without the `t3_` prefix) | | `sort` | enum | - | `top` | Comment sort: `top` · `new` · `best` · `controversial` · `old` | ```json { "post": { "id": "t3_1u6hmvz", "title": "Jeffco Public Schools says 61 boys found on girls' sports rosters were mascots, managers", "score": 13367, "num_comments": 353, "upvote_ratio": 0.9787, "author": "HazyDavey68", "subreddit": "news", "permalink": "https://www.reddit.com/r/news/comments/1u6hmvz/…", "external_url": "https://www.denverpost.com/2026/06/13/…", "domain": "denverpost.com", "body": null, "created": "2026-06-15T14:06:56.173000+0000" }, "comments_count": 13, "sort": "top", "comments": [ { "id": "t1_orsif6b", "parent_id": null, "depth": 0, "score": 4561, "author": "ByRWBadger", "body": "…", "created": "2026-06-15T14:12:53.614000+0000", "permalink": "https://www.reddit.com/r/news/comments/1u6hmvz/comment/orsif6b/" } /* 12 more comments */ ] } ``` ### `reddit.user` A user's public profile plus their recent posts and comments. ```http GET https://api.chocodata.com/api/v1/reddit/user?api_key=YOUR_KEY&username=spez ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `username` | string | ✅ yes | - | Reddit username without the `u/` prefix | ```json { "profile": { "username": "spez", "profile_url": "https://www.reddit.com/user/spez", "bio": "Reddit CEO", "icon": "https://www.redditstatic.com/icon.png/", "title": "overview for spez" }, "total_results": 25, "items": [ { "type": "comment", "id": "t1_optfyql", "short_id": "optfyql", "title": "/u/spez on Steve, Jen, and Drew here - Ask Us Anything!", "subreddit": "RDDT", "body": "…", "permalink": "https://www.reddit.com/r/RDDT/comments/1tvs5jj/…/optfyql/", "created": "2026-06-05T00:51:55+00:00" } /* 24 more items */ ] } ``` ### `reddit.search` Keyword search across Reddit. Results may be posts, comments, or subreddits. ```http GET https://api.chocodata.com/api/v1/reddit/search?api_key=YOUR_KEY&q=climate ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `q` | string | ✅ yes | - | Search keywords | | `subreddit` | string | - | - | Restrict the search to one subreddit | | `sort` | enum | - | `relevance` | `relevance` · `hot` · `top` · `new` · `comments` | | `t` | enum | - | site default | Time window (`hour` · `day` · `week` · `month` · `year` · `all`) | ```json { "query": "climate", "subreddit": null, "sort": "relevance", "total_results": 25, "results": [ { "position": 1, "id": "t5_2qhx3", "short_id": "2qhx3", "result_type": "subreddit", "title": "Information about the world's climate", "permalink": "https://www.reddit.com/r/climate/" } /* 24 more results */ ] } ``` ### `youtube.video` Metadata for a single YouTube video: title, description, view and like counts, channel, and related videos. ```http GET https://api.chocodata.com/api/v1/youtube/video?api_key=YOUR_KEY&video_id=dQw4w9WgXcQ ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `video_id` | string | ✅ yes | - | The 11-character YouTube video ID | ```json { "video_id": "dQw4w9WgXcQ", "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "type": "video", "title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)", "description": "The official video for “Never Gonna Give You Up”…", "view_count": 1783154138, "like_count": 19157745, "duration_seconds": 213, "keywords": ["rick astley", "never gonna give you up"], "category": "Music", "is_live": false, "is_family_safe": true, "publish_date": "2009-10-24T23:57:33-07:00", "channel_id": "UCuAXFkgsw1L7xaCfnd5JJOw", "channel_name": "Rick Astley", "channel_handle": "http://www.youtube.com/@RickAstleyYT", "channel_url": "https://www.youtube.com/channel/UCuAXFkgsw1L7xaCfnd5JJOw", "thumbnail": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hq720.jpg", "related_count": 12 } ``` ### `youtube.channel` A channel's profile and a page of its recent uploads. ```http GET https://api.chocodata.com/api/v1/youtube/channel?api_key=YOUR_KEY&channel=@MrBeast ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `channel` | string | ✅ yes | - | A channel handle (`@MrBeast`), channel ID (`UC…`), or channel URL | ```json { "channel": "MrBeast", "channel_id": "UCX6OQ3DkcsbYNE6H8uQQuVA", "channel_name": "MrBeast", "handle": "@MrBeast", "vanity_url": "http://www.youtube.com/@MrBeast", "url": "https://www.youtube.com/channel/UCX6OQ3DkcsbYNE6H8uQQuVA", "avatar": "https://yt3.googleusercontent.com/nxYrc_1_2f77DoBadyxMTmv7ZpRZapHR5jbuYe7…", "is_family_safe": true, "subscriber_count": 501000000, "subscriber_count_text": "501M subscribers", "video_count": 987, "videos": [ { "position": 1, "id": "__fmDj0ZJ1Q", "title": "50 YouTube Legends Fight For $1,000,000", "url": "https://www.youtube.com/watch?v=__fmDj0ZJ1Q", "thumbnail": "https://i.ytimg.com/vi/__fmDj0ZJ1Q/hq720.jpg", "channel": "MrBeast", "views": "40,376,569 views", "published": "2d ago" } /* more videos */ ] } ``` ### `youtube.comments` A page of comments on a YouTube video, with author, like count, and reply count. ```http GET https://api.chocodata.com/api/v1/youtube/comments?api_key=YOUR_KEY&video_id=dQw4w9WgXcQ ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `video_id` | string | ✅ yes | - | The 11-character YouTube video ID | | `sort` | enum | - | `top` | `top` · `newest` | ```json { "video_id": "dQw4w9WgXcQ", "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "sort_applied": "top", "results_count": 20, "comments": [ { "id": "EhpVZ3pnZTM0MGRCZ0I3NWhXQm01NEFhQUJBZyAoKAE%3D", "text": "can confirm: he never gave us up", "author": "@YouTube", "author_channel_id": "UCBR8-60-B28hp2BmDPdntcQ", "author_is_verified": true, "published": "1 year ago", "like_count": "255K", "reply_count": "960" } /* 19 more comments */ ], "continuation": "Eg0SC2RRdzR3OVdnWGNRGAYygAMK…" } ``` ### `youtube.transcript` The timed transcript of a YouTube video, returned as timestamped segments and as one joined string. ```http GET https://api.chocodata.com/api/v1/youtube/transcript?api_key=YOUR_KEY&video_id=dQw4w9WgXcQ ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `video_id` | string | ✅ yes | - | The 11-character YouTube video ID | | `language` | string | - | default track | Preferred caption language code (e.g. `en`, `de-DE`); one of the `available_languages` | ```json { "video_id": "dQw4w9WgXcQ", "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "language": "en", "language_name": "English", "is_generated": false, "segment_count": 61, "available_languages": ["en", "de-DE", "ja", "pt-BR", "es-419"], "segments": [ { "text": "[♪♪♪]", "start": 1360, "duration": 1680 }, { "text": "♪ We're no strangers to love ♪", "start": 18640, "duration": 3240 } /* 59 more segments */ ], "text": "[♪♪♪] ♪ We're no strangers to love ♪ ♪ You know the rules and so do I ♪…" } ``` ### `tiktok.video` Metadata and engagement stats for a single TikTok video. ```http GET https://api.chocodata.com/api/v1/tiktok/video?api_key=YOUR_KEY&url=https://www.tiktok.com/@tiktok/video/7106594312292453675 ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `url` | string | ✅ yes | - | Full TikTok video URL (a bare numeric video ID is also accepted) | ```json { "id": "7106594312292453675", "url": "https://www.tiktok.com/@tiktok/video/7106594312292453675", "socialPlatform": "tiktok", "title": "how many frogs did you find? 🐸 check out tiktok's #Minecraft community today! @Gorillo", "description": "how many frogs did you find? 🐸 check out tiktok's #Minecraft community today! @Gorillo", "create_time": 1654632929, "created_at": "2022-06-07T20:15:29.000Z", "author": { "id": "107955", "uniqueId": "tiktok", "nickname": "TikTok", "verified": true, "signature": "One TikTok can make a big impact", "url": "https://www.tiktok.com/@tiktok" }, "stats": { "plays": 563500, "likes": 98700, "comments": 1339, "shares": 127, "saves": 58626 }, "hashtags": ["Minecraft"], "thumbnail": "https://p16-common-sign.tiktokcdn-us.com/…", "music": { "title": "original sound", "author_name": "TikTok" } } ``` ### `tiktok.profile` A TikTok creator's public profile and aggregate stats. ```http GET https://api.chocodata.com/api/v1/tiktok/profile?api_key=YOUR_KEY&username=tiktok ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `username` | string | ✅ yes | - | TikTok handle without the `@` prefix | ```json { "uniqueId": "tiktok", "nickname": "TikTok", "signature": "One TikTok can make a big impact", "verified": true, "secUid": "MS4wLjABAAAAv7iSuuXDJGDvJkmH_vz1qkDZYo1apxgzaxdBSeIuPiM", "id": "107955", "create_time": 1425144149, "created_at": "2015-02-28T17:22:29.000Z", "avatar": "https://p16-common-sign.tiktokcdn-us.com/…", "url": "https://www.tiktok.com/@tiktok", "socialPlatform": "tiktok", "stats": { "followerCount": 94400000, "followingCount": 3, "heartCount": 459900000, "videoCount": 1452 } } ``` ### `tiktok.oembed` The lightweight oEmbed record for a TikTok video: title, author, thumbnail, and ready-to-paste embed HTML. ```http GET https://api.chocodata.com/api/v1/tiktok/oembed?api_key=YOUR_KEY&url=https://www.tiktok.com/@tiktok/video/7106594312292453675 ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `url` | string | ✅ yes | - | Full TikTok video URL | ```json { "id": "7106594312292453675", "url": "https://www.tiktok.com/@tiktok/video/7106594312292453675", "socialPlatform": "tiktok", "type": "video", "version": "1.0", "title": "how many frogs did you find? 🐸 check out tiktok's #Minecraft community today! @Gorillo", "author_name": "TikTok", "author_unique_id": "tiktok", "author_url": "https://www.tiktok.com/@tiktok", "provider_name": "TikTok", "provider_url": "https://www.tiktok.com", "thumbnail_url": "https://p16-common-sign.tiktokcdn.com/…", "thumbnail_width": 576, "thumbnail_height": 1024, "embed_type": "video", "html": "
" } ``` ### `xtwitter.tweet` A single tweet (post) from X with text, engagement counts, and author. ```http GET https://api.chocodata.com/api/v1/xtwitter/tweet?api_key=YOUR_KEY&id=20 ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `id` | string | ✅ yes | - | The numeric tweet (post) ID | ```json { "id": "20", "id_str": "20", "url": "https://x.com/jack/status/20", "socialPlatform": "twitter", "text": "just setting up my twttr", "full_text": "just setting up my twttr", "lang": "en", "created_at": "2006-03-21T20:50:14.000Z", "favorite_count": 311977, "reply_count": 17945, "conversation_count": 17945, "is_edited": false, "user": { "screen_name": "jack", "name": "jack", "id_str": "12", "is_blue_verified": true, "profile_image_url_https": "https://pbs.twimg.com/profile_images/…/azNjKOSH_normal.jpg", "url": "https://x.com/jack" }, "entities": { "urls": [], "user_mentions": [], "hashtags": [], "symbols": [] }, "media": [] } ``` ### `instagram.profile` An Instagram account's public profile and counts. ```http GET https://api.chocodata.com/api/v1/instagram/profile?api_key=YOUR_KEY&username=nasa ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `username` | string | ✅ yes | - | Instagram handle without the `@` prefix | ```json { "id": "528817151", "username": "nasa", "full_name": "NASA", "url": "https://www.instagram.com/nasa/", "biography": "Making the seemingly impossible, possible. ✨", "profile_pic_url": "https://scontent.cdninstagram.com/v/t51.2885-19/…", "is_verified": true, "is_private": false, "follower_count": 104408885, "following_count": 91, "posts_count": 4818, "og_description": "104M Followers, 95 Following, 4,818 Posts - See Instagram photos and videos from NASA" } ``` ### `instagram.post` A single Instagram post by its shortcode: caption, media, author, and engagement. ```http GET https://api.chocodata.com/api/v1/instagram/post?api_key=YOUR_KEY&shortcode=DWm8OQKlKvC ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `shortcode` | string | ✅ yes | - | The post shortcode (the segment after `/p/` in the URL) | ```json { "id": "DWm8OQKlKvC", "shortcode": "DWm8OQKlKvC", "media_id": "3866042192364874690", "url": "https://www.instagram.com/p/DWm8OQKlKvC/", "media_type": "carousel", "is_video": false, "title": "NASA launches Artemis II to the moon!", "author": "agpfoto", "author_id": "19918380", "author_url": "https://www.instagram.com/agpfoto/", "images": [ "https://scontent.cdninstagram.com/v/t51.82787-15/…" ], "thumbnail": "https://scontent.cdninstagram.com/v/t51.82787-15/…", "dimensions": { "width": 1080, "height": 1440 }, "caption": "NASA launches Artemis II to the moon!", "hashtags": [], "comments": 1079, "taken_at": "2026-04-02T00:02:44.000Z", "location": { "id": "254918491", "name": "Launch Complex 39 Press Site", "slug": "launch-complex-39-press-site" } } ``` ### `facebook.page` Search Facebook for pages by name and get matching page cards with like and follower counts. ```http GET https://api.chocodata.com/api/v1/facebook/page?api_key=YOUR_KEY&page=NASA ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `page` | string | ✅ yes | - | A page name or query | ```json { "query": "NASA", "page": 1, "results_count": 8, "total_results": 8, "results": [ { "position": 1, "id": "100044561550831", "title": "NASA - National Aeronautics and Space Administration", "url": "https://www.facebook.com/NASA/", "currency": "USD", "thumbnail": "https://scontent.xx.fbcdn.net/v/t39.30808-1/…", "type": "page", "page": "NASA", "page_id": "100044561550831", "name": "NASA - National Aeronautics and Space Administration", "tagline": "Explore the universe and discover our home planet.", "image": "https://scontent.xx.fbcdn.net/v/t39.30808-1/…", "likes": 28622651, "followers": 28622651, "talking_about_count": 141519 } /* 7 more page cards */ ] } ``` ### `facebook.post` A single Facebook post by URL: author, caption, media, and engagement counts. ```http GET https://api.chocodata.com/api/v1/facebook/post?api_key=YOUR_KEY&url=https://www.facebook.com/{page}/posts/{post_id} ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `url` | string | ✅ yes | - | Full Facebook post URL | ```json { "id": "1148...", "page": "NASA", "page_id": "100044561550831", "post_id": "1148...", "url": "https://www.facebook.com/NASA/posts/1148…", "author": "NASA - National Aeronautics and Space Administration", "caption": "Explore the universe and discover our home planet.", "title": "Explore the universe and discover our home planet.", "image": "https://scontent.xx.fbcdn.net/v/t39.30808-1/…", "reactions_count": 12840, "comments_count": 321, "shares_count": 188 } ``` ### `linkedin.jobsearch` Search LinkedIn job postings by keywords and location. ```http GET https://api.chocodata.com/api/v1/linkedin/jobsearch?api_key=YOUR_KEY&keywords=software%20engineer&location=United%20States ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `keywords` | string | ✅ yes | - | Job title or keywords | | `location` | string | - | - | Location to scope the search to | | `start` | int | - | `0` | Result offset for paging | ```json { "query": "software engineer", "keywords": "software engineer", "location": "United States", "start": 0, "page": 1, "total_results": 10, "jobs": [ { "position": 1, "id": "4407498584", "job_id": "4407498584", "title": "Software Engineer, New Grad (AI)", "url": "https://www.linkedin.com/jobs/view/software-engineer-new-grad-ai-at-notion-4407498584", "currency": "USD", "company": "Notion", "company_url": "https://www.linkedin.com/company/notionhq", "location": "San Francisco, CA", "posted_date": "2026-06-09", "posted_label": "6 days ago", "company_logo": "https://media.licdn.com/dms/image/v2/…/notionhq_logo" } /* 9 more jobs */ ] } ``` ### `linkedin.job` The full detail of a single LinkedIn job posting. ```http GET https://api.chocodata.com/api/v1/linkedin/job?api_key=YOUR_KEY&job_id=4407498584 ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `job_id` | string | ✅ yes | - | The numeric LinkedIn job ID | ```json { "id": "4407498584", "job_id": "4407498584", "title": "Software Engineer, New Grad (AI)", "url": "https://www.linkedin.com/jobs/view/4407498584", "company": "Notion", "company_url": "https://www.linkedin.com/company/notionhq", "location": "San Francisco, CA", "applicants": "Over 200 applicants", "posted_label": "6 days ago", "salary": "$160,000.00 - $250,000.00", "seniority": "Entry level", "employment_type": "Full-time", "job_function": "Engineering and Information Technology", "industries": "Software Development", "description": "Who We Are Notion is the collaborative AI workspace where teams and agents think together…", "company_logo": "https://media.licdn.com/dms/image/v2/…/notionhq_logo" } ``` ### `linkedin.company` A LinkedIn company page: industry, size, headquarters, and specialties. ```http GET https://api.chocodata.com/api/v1/linkedin/company?api_key=YOUR_KEY&company=microsoft ``` | Param | Type | Required | Default | Description | |---|---|---|---|---| | `company` | string | ✅ yes | - | The company's LinkedIn vanity name or numeric ID | ```json { "id": "microsoft", "name": "Microsoft", "url": "https://www.linkedin.com/company/microsoft", "followers": 28379789, "employee_count": 232340, "industry": "Software Development", "company_size": "10,001+ employees", "headquarters": "Redmond, Washington", "website": "https://news.microsoft.com/", "description": "Every company has a mission. What's ours? To empower every person and every organization to achieve more…", "specialties": "Business Software, Developer Tools, Cloud Computing, AI, Machine Learning…", "logo": "https://media.licdn.com/dms/image/v2/…/microsoft_logo" } ``` ## Find the exact endpoint for any site There are three ways to look up the precise path, parameters, and response for a specific site. ### 1. The scraper directory The full browsable catalogue of all 470 endpoints lives at **[/scraper-api](/scraper-api)**. Search by brand or category (for example `zillow`, `linkedin`, `finance`), and open any card for that endpoint's request shape, parameters, and an example. Cards marked **JSON** return structured data from a dedicated endpoint; cards marked **via Universal** route through the [Universal Web Scraper API](/docs/endpoints/universal). ### 2. The dashboard playground The **[dashboard playground](https://app.chocodata.com)** lets you run any endpoint live against your key, see the real response, and copy a ready-made cURL / Node / Python snippet. It is the fastest way to confirm field names and try parameters before you write code. ### 3. The machine-readable corpus For coding agents and LLMs, the full docs are available as [`/docs/llms.txt`](/docs/llms.txt) (manifest) and [`/docs/llms-full.txt`](/docs/llms-full.txt) (single-file concatenation). Append `.md` to any doc URL for raw Markdown. ## Anything not in the directory If a site or page type has no dedicated endpoint yet, you are not blocked: pass its URL to the [Universal Web Scraper API](/docs/endpoints/universal) and get back HTML, text, or auto-extracted JSON. The Universal endpoint covers the entire web, so the directory is a list of where we have a hand-tuned parser, not a list of what you can scrape. ## Related - [Core concepts](/docs/core-concepts) - the model behind every endpoint. - [Product endpoint](/docs/endpoints/product) - specific-item pattern in depth. - [Search endpoint](/docs/endpoints/search) - list pattern in depth. - [Universal Web Scraper API](/docs/endpoints/universal) - scrape any URL. - [Batch endpoint](/docs/endpoints/batch) - run many inputs against any endpoint asynchronously. --- # Changelog _Source: https://chocodata.com/docs/changelog_ # Changelog Breaking API changes are announced 14 days in advance via email + this page. In-flight requests are always settled under the old contract. --- ## 2026-06-11 **Docs** - Restructured the docs around the one-request-shape model. New pages: [Core concepts](/docs/core-concepts) (how the API works, dedicated endpoints vs the Universal scraper, the credit model) and [Endpoint reference](/docs/endpoint-reference) (how 453 endpoints share one shape and how to find any of them). - New [Universal Web Scraper API](/docs/endpoints/universal) page: scrape any URL to JSON, HTML, or text via `/api/v1/universal/get`. - Getting started rewritten to lead with the generic `{site}/{resource}` pattern and a non-marketplace-specific path. Error codes and billing generalized across all verticals (not just commerce). - The full browsable catalogue of all endpoints now lives at [/scraper-api](/scraper-api). --- ## 2026-05-19 **API** - New error code: `404 product_not_found`. When a target site returns a 404 for the requested item (delisted, ID malformed, or never existed in the chosen region), the API now returns a clean `404` with `{"error":"product_not_found","retryable":false}` instead of the misleading `502 target_unreachable` it returned before. SDK consumers should treat 404 as a terminal failure; the item won't come back. - Skip the redundant retry strategy on confirmed 404s. Saves 1 paid-residential rotation per delisted item. - See updated [error codes table](/docs/guides/errors) and [billing policy](/docs/guides/billing) (404s remain free). - Bot-manager interstitial detection (rolled out on the Amazon search target first): the extractor now recognizes the Akamai bot-manager interstitial (a 2-3 KB JS shell carrying a `bm-verify` token and a `triggerInterstitialChallenge` script). Previously these silent challenges slipped past the captcha detector, the extractor saw zero results, and the request returned `502 extraction_failed`. They now trigger the standard residential-IP rotation; you weren't charged before and still aren't, but the failure is now both rarer (rotation almost always recovers) and more honest when it does happen. - Search endpoints add an optional `no_results: true` flag (with `total_results: 0`) for the rare case where a site returns a real search page with zero matches. Pre-patch this surfaced as `502 extraction_failed`; it now correctly returns HTTP 200 with an empty `products` array. - Marketplace search field-extraction fixes (5 fields on every result card; shipped on the Amazon target): - `reviews_count`: now populated. Pre-fix this was always `null`; the extractor was looking at a markup class the site hadn't shipped in months. Source of truth is now the `aria-label` of the rating wrapper ("5,054 ratings"), which gives the precise integer rather than the rounded "(5K)" display value. Coverage on a typical search page: ~95-100%; `null` only when the site hides reviews on brand-new listings. - `price_strikethrough`: no longer captures per-unit prices. The historic bug surfaced as `0.23` on a cologne card (which is the `$0.23/milliliter` per-unit price, not a list price). The fix requires a strikethrough marker and rejects any candidate whose surrounding text contains `/fluid ounce`, `/milliliter`, `/Fl Oz`, `/Count`, `/Pound`, `/Ounce`, `/Each`, `/Pack`, etc. Strikethrough is now `null` when the product isn't on sale, and strictly greater than the current price when it is. - `highest_price`: mirrors the fixed strikethrough on search cards (real multi-tier ranges only surface on product pages). - `sales_volume`: now populated. Returns the verbatim site string (`"5K+ bought in past month"`, `"200+ bought in past week"`). Coverage on popular categories: ~80-100%; `null` when the site doesn't surface a velocity badge (low-traffic categories). - `organic_position`: guaranteed 1-indexed across non-sponsored cards. Sponsored cards get `sponsored_position` instead. The first organic card after any run of sponsored cards still gets position 1. --- ## 2026-04-21 **Dashboard** - `/app/settings/billing`: PAYG top-up calculator now updates the "Credits you'll get" and "Basic scrapes" fields live as you type. - Loading skeletons added to dashboard home, billing, usage, and API keys routes - no more blank flash on first navigation. - Sidebar "Documentation" link now points at the new `/docs` site. **Docs** - Launched `/docs` on the marketing site: Getting started, endpoint pages (product / search / batch), guides (auth, errors, rate limits, SDKs, billing, country & language), and this changelog. - Added `/docs/llms.txt`, `/docs/llms-full.txt`, and an `.md` variant for every doc URL so LLMs / coding agents can ingest the full corpus cheaply. **SDKs** - `chocodata` (Node) v0.1.4 - README updated with 5-credit pricing, removed claims about "no credit system". - `chocodata` (Python) v0.1.4 - same. - `chocodata-go` v0.1.4 - same. --- ## 2026-04-16 **API** - Credit rebase: 1 basic scrape is now priced at **5 credits** (was 1). Plan allowances and PAYG packages scaled x5 accordingly - your dollar-cost-per-scrape is unchanged. The "1 credit = 1 scrape" pre-launch shorthand only applied to internal testing. - Headers prefix changed from `Spb-*` to `Asa-*`. Old prefix will keep returning values for 90 days, then be removed. **Dashboard** - Live credit balance now read from authoritative ledger on every page load - no more stale numbers from cached mirrors. - Monthly usage graph restyled with visible bars even on zero-credit days. --- ## 2026-04-10 **API** - Shipped production success-rate improvements for the Amazon target's latest A/B layout. Measured SR jumped from ~87% to ~97% on a 30-query mixed international set. - Parser hardening: gift-card and subscription-plan product templates now extract correctly instead of hitting `extraction_failed`. --- ## 2026-04-01 **API** - `render_js` and `screenshot` query params reserved. Passing either returns `501 not_implemented` today; the real implementation is on the roadmap for Q3. - New response header `Asa-Attempts` reports how many internal retries we used to fetch your page. **Billing** - Non-2xx responses are now *guaranteed* free - no edge case where a partial 502 is charged. See [Billing policy](/docs/guides/billing). --- ## Earlier Older entries predate the public launch and aren't preserved here. If you need historical info (old behaviour of a specific endpoint), email and we'll pull it from internal records. ## Related - [Getting started](/docs/getting-started) - [Billing policy](/docs/guides/billing) --- # Product endpoint _Source: https://chocodata.com/docs/endpoints/product_ # `GET /api/v1/{site}/product` Returns structured data for a single product page on any supported e-commerce site. `product` is one resource type out of many: the same request and response shape covers the other dedicated specific-item endpoints (`article`, `job`, `listing`, `profile`, `property`, `video`, and more). Every endpoint follows the same `/api/v1/{site}/{resource}?api_key=YOUR_KEY&query=...` shape, so the patterns here carry over to the rest of the [453-endpoint catalogue](/scraper-api). This page is the canonical **specific-item pattern**: one identifier in, one rich JSON object out. The example below uses `walmart` as the site; swap it for any other supported store (`ebay`, `target`, `bestbuy`, `etsy`, and many more). ## Request ```http GET /api/v1/walmart/product?query=5085206428&api_key=cd_live_YOUR_KEY ``` ### Query parameters | Param | Type | Required | Default | Description | |---|---|---|---|---| | `query` | string | ✅ yes | - | The product identifier or full product URL for the chosen site | | `domain` | enum | - | site default | Regional storefront for sites that have one (e.g. `com`, `co.uk`, `de`). Supported values vary by site. | | `language` | string | - | region default | Content language as `xx_YY` (e.g. `en_US`, `de_DE`, `es_ES`). See [supported languages per region](/docs/guides/country-and-language). | | `add_html` | boolean | - | `false` | Attach the raw HTML of the page under `html` in the response | | `render_js` | boolean | - | - | Coming soon - returns 501 today | | `screenshot` | boolean | - | - | Coming soon - returns 501 today | ### Response (200) ```json { "id": "5085206428", "id_in_url": "5085206428", "parent_id": null, "url": "https://www.walmart.com/ip/5085206428", "page": 1, "page_type": "Product", "title": "LEGEND COOKWARE 5-Ply Stainless Steel Cookware Set", "product_name": "LEGEND COOKWARE 5-Ply Stainless Steel Cookware Set", "description": "Premium 5-ply construction distributes heat evenly …", "bullet_points": "5-PLY STAINLESS STEEL COOKWARE HEATS EVENLY\\n…", "brand": "LEGEND COOKWARE", "manufacturer": "LEGEND COOKWARE", "store_url": "https://www.walmart.com/seller/LEGEND+COOKWARE", "price": 299.99, "price_main": 299.99, "price_strikethrough": 399.99, "currency": "USD", "pricing_count": 3, "stock": "In Stock", "max_quantity": 10, "delivery": [ { "date": { "by": "Tuesday, April 25" }, "type": "FREE Delivery" } ], "featured_merchant": { "name": "LEGEND COOKWARE Store", "seller_id": "A1EXAMPLE", "is_fulfilled_by_site": true }, "offers": [ { "price": 299.99, "seller_name": "LEGEND COOKWARE Store", "seller_id": "A1EXAMPLE", "stock": "In Stock", "returns": "Free returns within 30 days" } ], "category": [ { "ladder": [ { "name": "Home & Kitchen", "url": "…" }, { "name": "Kitchen & Dining", "url": "…" }, { "name": "Cookware", "url": "…" } ]} ], "rating": 4.7, "reviews_count": 2481, "rating_stars_distribution": [ { "rating": 5, "percentage": 72 }, { "rating": 4, "percentage": 18 }, { "rating": 3, "percentage": 6 }, { "rating": 2, "percentage": 2 }, { "rating": 1, "percentage": 2 } ], "reviews": [ { "id": "R1…", "author": "J. Smith", "title": "Excellent build quality", "content": "These pans heat evenly and clean up beautifully.", "rating": 5, "timestamp": "Reviewed in the United States on April 12, 2026", "is_verified": true, "helpful_count": 12 } /* ~8 more */ ], "answered_questions_count": 23, "variations": [ { "id": "5085206428", "dimensions": { "color": "Silver", "size": "10 pc" }, "selected": true }, { "id": "5085206429", "dimensions": { "color": "Copper", "size": "10 pc" }, "selected": false } ], "images": [ "https://cdn.example.com/images/cookware-set-71…", "https://cdn.example.com/images/cookware-set-81…" ], "has_videos": true, "product_details": { "Brand": "LEGEND COOKWARE", "Material": "Stainless Steel" }, "product_dimensions": "17.1 x 13.1 x 12.6 inches", "sales_rank": [ { "ladder": [{ "name": "Home & Kitchen", "url": "…" }], "rank": 342 } ] } ``` ### Response headers - `Asa-Cost` - credits spent on this request (`5` for a standard product scrape, `0` for any non-2xx response) - `Asa-Resolved-Url` - the final target URL after any redirects - `Asa-Source-Status` - the target site's raw HTTP status (may differ from our HTTP status) - `Asa-Attempts` - how many internal attempts it took us to get this result - `Asa-Extractor-Version` - e.g. `walmart@1.0.0` ## Errors | HTTP | Body | Reason | |---|---|---| | 400 | `{"error":"invalid_params"}` | Query identifier malformed, or `domain` not valid for this site | | 401 | `{"error":"unauthorized"}` | Missing / bad API key | | 429 | `{"error":"rate_limited"}` | Too many requests from your key | | 501 | `{"error":"not_implemented","params":["render_js"]}` | You passed a roadmap param | | 404 | `{"error":"product_not_found","retryable":false}` | The target site returned 404 for the item: delisted, malformed, or never existed in this region. No retry will help. | | 502 | `{"error":"target_unreachable"}` | The target site blocked every internal retry we made (transient anti-bot pressure on hot items) | | 502 | `{"error":"extraction_failed"}` | The target site served something we couldn't parse | | 502 | `{"error":"id_mismatch"}` | The requested item redirected to a different product | | 502 | `{"error":"generic_gallery_page"}` | The target served a placeholder gallery on a 200 response (different from the 404 case above; same outcome: drop the item) | ## Related - [Endpoint reference](/docs/endpoint-reference) - the resource model this endpoint is one example of - [Search endpoint](/docs/endpoints/search) - the list pattern - [Universal Web Scraper API](/docs/endpoints/universal) - for sites without a dedicated endpoint - [Async batch endpoint](/docs/endpoints/batch) - [Country + content language guide](/docs/guides/country-and-language) - [Billing policy - only-2xx billing](/docs/guides/billing) --- # Search endpoint _Source: https://chocodata.com/docs/endpoints/search_ # `GET /api/v1/{site}/search` Returns ranked search-results for a keyword query. `search` is the most widely available resource: it works on search engines (`google`, `bing`), marketplaces (`walmart`, `ebay`), job boards (`indeed`), app stores, and more. This page is the canonical **list pattern**: a keyword in, ranked results out. The example below uses `google`; swap the site for any other that exposes a `search` endpoint. Marketplace search cards include commerce fields (price, rating, sponsored flags); search-engine results return web fields (title, url, snippet). The shape adapts to the target. ## Request ```http GET /api/v1/google/search?query=wireless+headphones&api_key=cd_live_YOUR_KEY ``` ### Query parameters | Param | Type | Required | Default | Description | |---|---|---|---|---| | `query` | string | ✅ yes | - | Search keywords | | `domain` | enum | - | site default | Regional storefront / locale for sites that have one | | `sort_by` | enum | - | `best_match` | Marketplace sort: `best_match` · `price_asc` · `price_desc` · `avg_customer_review` · `newest` (where the site supports it) | | `start_page` | int | - | `1` | Page to start from (1-10) | | `pages` | int | - | `1` | Number of consecutive pages to fetch (1-10) - each counts as a request | ### Response (200) - marketplace search For a marketplace site, each card carries commerce fields: ```json { "page": 1, "products": [ { "id": "5085206428", "title": "Acqua Di Gio By Giorgio Armani For Men. Eau De Toilette Spray 3.4 Fl Oz", "url": "https://www.walmart.com/ip/5085206428", "price": 19.57, "price_strikethrough": null, "highest_price": null, "currency": "USD", "rating": 4.5, "reviews_count": 150532, "sales_volume": "20K+ bought in past month", "image": "https://cdn.example.com/images/acqua-di-gio-71…", "is_sponsored": false, "best_seller": true, "organic_position": 1, "sponsored_position": null, "shipping_information": null, "pricing_count": null, "manufacturer": null, "is_video": false } ], "html": "" } ``` ### Response (200) - search engine For a search engine like Google, each result carries web fields: ```json { "query": "wireless headphones", "results": [ { "position": 1, "title": "The 6 Best Wireless Headphones of 2026, Tested", "url": "https://www.example.com/best-wireless-headphones", "snippet": "We tested 30 pairs over three months to find the…" } ], "related_searches": ["noise cancelling headphones", "wireless earbuds"], "total_results": 48300000 } ``` ### Field notes (marketplace cards) | Field | Notes | |---|---| | `price_strikethrough` | The list price ("was") when the product is on sale. `null` when the product is at its normal price. We explicitly filter out per-unit prices (`$0.23/fluid ounce`, `$11.15/milliliter`, etc.) so this only ever holds a real discount anchor. | | `highest_price` | Same value as `price_strikethrough` on the search card. Search cards never expose a multi-tier price range (only product pages do), so the strikethrough is the implicit upper bound. | | `reviews_count` | Precise integer count parsed from the rating wrapper (e.g. `"5,054 ratings"`). Not the rounded `(5K)` display value. `null` when the site hides reviews (typical for brand-new listings). | | `sales_volume` | Verbatim site string, e.g. `"5K+ bought in past month"`, `"200+ bought in past week"`. `null` when the site doesn't surface a badge for that card (low-velocity categories often omit it). | | `organic_position` | 1-indexed rank among non-sponsored cards. Sponsored cards have `organic_position: null` and a positive `sponsored_position` instead. The first organic card after any run of sponsored cards still gets `organic_position: 1`. | ### Empty-result response (200) When a site legitimately returns zero matches for a query, the response stays HTTP 200 with an empty results array and a `no_results: true` flag, so you can tell "the site returned nothing" from "we failed to scrape": ```json { "page": 1, "products": [], "no_results": true, "total_results": 0, "html": null } ``` ## Common errors Same as the product endpoint. `502 target_unreachable` on international storefronts is more common than on the default region - see our [country routing guide](/docs/guides/country-and-language) and the full [error codes](/docs/guides/errors) list. ## Related - [Endpoint reference](/docs/endpoint-reference) - the resource model - [Product endpoint](/docs/endpoints/product) - the specific-item pattern - [Universal Web Scraper API](/docs/endpoints/universal) - for sites without a dedicated endpoint - [Batch a hundred search queries](/docs/endpoints/batch) --- # Universal Web Scraper API _Source: https://chocodata.com/docs/endpoints/universal_ # `GET /api/v1/universal/get` The Universal Web Scraper API fetches any URL on the web and returns it in the format you ask for. Use it for sites or page types that do not have a [dedicated endpoint](/docs/endpoint-reference), or any time you just want the raw page (or auto-extracted JSON) from an arbitrary URL. You pass a `url`, Chocodata handles the proxies, headers, and anti-bot, and you get back the page. It is the same engine as the dedicated endpoints, exposed as a general-purpose fetcher. ## Request ```http GET /api/v1/universal/get?url=https://example.com/article/123&api_key=cd_live_YOUR_KEY ``` ```bash curl "https://api.chocodata.com/api/v1/universal/get?api_key=cd_live_YOUR_KEY&url=https://example.com/article/123&parse=auto" ``` ### Query parameters | Param | Type | Required | Default | Description | |---|---|---|---|---| | `url` | string | yes | - | The full URL to fetch. URL-encode it if it contains `&`, `?`, or spaces. | | `parse` | enum | - | `auto` | Output format: `auto` (best-effort structured JSON), `html` (raw rendered HTML), `text` (readable plain text), `json` (alias for `auto`). | | `country` | string | - | auto | Two-letter country code to force proxy egress (`us`, `de`, `gb`). Omit to let us choose. | ## Parse modes The `parse` parameter decides what you get back: | `parse` | Returns | Use when | |---|---|---| | `auto` | Structured JSON: title, main text, links, metadata extracted from the page | You want clean fields without writing selectors | | `html` | The full rendered HTML of the page | You want to run your own parser / selectors | | `text` | Boilerplate-stripped readable text | You want article text for search, RAG, or summarization | | `json` | Same as `auto` | Explicit alias | ### Response (200) - `parse=auto` ```json { "url": "https://example.com/article/123", "resolved_url": "https://example.com/article/123", "title": "How tariffs reshaped the supply chain", "text": "The new tariffs took effect in March and within weeks ...", "links": [ { "text": "Read the full report", "href": "https://example.com/report" } ], "metadata": { "description": "An analysis of 2026 supply-chain shifts.", "author": "J. Rivera", "published": "2026-03-14" } } ``` ### Response (200) - `parse=html` ```json { "url": "https://example.com/article/123", "resolved_url": "https://example.com/article/123", "status": 200, "html": " ... " } ``` ### Response (200) - `parse=text` ```json { "url": "https://example.com/article/123", "resolved_url": "https://example.com/article/123", "text": "How tariffs reshaped the supply chain\n\nThe new tariffs took effect in March ..." } ``` ## When to use Universal vs a dedicated endpoint | Use Universal when... | Use a dedicated endpoint when... | |---|---| | The site / page type has no dedicated endpoint | The [directory](/scraper-api) lists a **JSON** endpoint for it | | You want the raw HTML to run your own parser | You want clean, validated, typed fields | | You are fetching arbitrary or one-off URLs | You are scraping a known site at scale | | You are feeding page text to an LLM / RAG pipeline | You need precise commerce / job / listing fields | If a card in the [scraper directory](/scraper-api) is marked **via Universal**, it means that site is scraped through this endpoint rather than a hand-tuned parser. You can call `/api/v1/universal/get` directly with the target URL. ## Cost A Universal request costs **5 credits**, the same as a dedicated request, and follows the same rule: **only successful (2xx) responses are billed.** Blocked pages, timeouts, and errors are free. See [Billing](/docs/guides/billing). `render_js` and `screenshot` are reserved for a future release and return `501 not_implemented` today. They will be opt-in add-ons (+10 credits each) when they ship. ## Errors Universal returns the same [error envelope](/docs/guides/errors) as every other endpoint. The most common cases: | HTTP | `error` | Meaning | |---|---|---| | `400` | `invalid_params` | `url` missing or malformed | | `401` | `unauthorized` | Missing / bad API key | | `429` | `rate_limited` | Over your concurrency or RPS ceiling | | `502` | `target_unreachable` | The target blocked every internal retry | | `502` | `extraction_failed` | `parse=auto` could not extract structure (try `parse=html` and parse it yourself) | ## Related - [Core concepts](/docs/core-concepts) - dedicated endpoints vs Universal. - [Endpoint reference](/docs/endpoint-reference) - find a dedicated endpoint first. - [Batch endpoint](/docs/endpoints/batch) - run many URLs through Universal asynchronously. - [Billing policy](/docs/guides/billing) - only-2xx billing. --- # Batch endpoint (async) _Source: https://chocodata.com/docs/endpoints/batch_ # `POST /api/v1/{site}/batch` - Async batch scraping Submit a list of product IDs, URLs, or search queries in one HTTP call. The worker processes them in the background and either POSTs the results to a webhook URL you provide or lets you poll for status. ## When to use this - You need to scrape 100+ items and don't want to manage request orchestration yourself - You want fire-and-forget semantics with a webhook callback - You're building a monitoring job that runs periodically across your catalogue Batch works with any endpoint, including the [Universal Web Scraper API](/docs/endpoints/universal) (`universal.get`). For a handful of items, use the plain sync endpoints instead - batch has ~60 s of overhead before processing begins. ## Request ```http POST /api/v1/walmart/batch?api_key=cd_live_YOUR_KEY Content-Type: application/json ``` ```json { "endpoint": "walmart.product", "items": [ { "query": "5085206428", "domain": "com" }, { "query": "5085206429", "domain": "com" }, { "query": "5085206430", "domain": "com", "language": "es_US" } ], "webhook_url": "https://your.server/webhooks/scrapes" } ``` ### Body parameters | Param | Type | Required | Description | |---|---|---|---| | `endpoint` | `"{site}.{resource}"` | ✅ yes | Which endpoint to call for each item, in `site.resource` form (e.g. `walmart.product`, `google.search`, `indeed.job`, `universal.get`) | | `items` | array (1-1000) | ✅ yes | Each item = same shape as the query string params you'd pass to the sync endpoint | | `webhook_url` | string (https) | - | If set, we POST the completed results here when all items are done. Must be a public URL. | ## Response (201) ```json { "id": "8a1f3b76-2e4c-4a7f-b9e2-1c9d3e5f7a8b", "status": "pending", "total_count": 3, "created_at": "2026-04-20T14:32:01.000Z", "webhook_signature_secret": "whsec_abcd1234...", "poll_url": "/api/v1/walmart/batch/8a1f3b76-2e4c-4a7f-b9e2-1c9d3e5f7a8b" } ``` **Save `webhook_signature_secret` now.** It's only returned once - use it to verify webhook POSTs are genuine. ## Processing Our worker runs every 60 s and processes up to 100 items per run across all pending batches. A 1,000-item batch typically completes within 10-12 minutes. Each item is dispatched through the same pipeline as the sync endpoints (same auth, same billing, same retries). **Only successful (2xx) items are billed.** ## Polling status ```http GET /api/v1/walmart/batch/8a1f3b76-2e4c-4a7f-b9e2-1c9d3e5f7a8b?api_key=cd_live_YOUR_KEY ``` ```json { "id": "8a1f3b76-...", "status": "running", // "pending" | "running" | "complete" | "failed" "endpoint": "walmart.product", "total_count": 3, "processed_count": 2, "success_count": 2, "failure_count": 0, "credits_charged": 2, "started_at": "2026-04-20T14:32:42.000Z", "completed_at": null, "webhook_url": "https://your.server/webhooks/scrapes", "webhook_delivered_at": null, "results": [ { "input": { "query": "5085206428", ... }, "status": "ok", "http_status": 200, "data": { /* full product JSON */ }, "duration_ms": 2851, "credits_charged": 1 }, { "input": { "query": "5085206429", ... }, "status": "ok", "http_status": 200, "data": { /* full product JSON */ }, "duration_ms": 3120, "credits_charged": 1 } ] } ``` ## Webhook payload Once all items complete, we POST to your `webhook_url`: ```http POST https://your.server/webhooks/scrapes Content-Type: application/json X-ASA-Batch-Id: 8a1f3b76-... X-ASA-Event: batch.completed X-ASA-Signature: sha256= ``` Body is identical to the `GET /batch/{id}` response. ### Verifying the signature (Node.js) ```javascript import crypto from "node:crypto"; function verifyWebhook(req, secret) { const sig = req.headers["x-asa-signature"]; // "sha256=" const expected = `sha256=${crypto.createHmac("sha256", secret).update(req.rawBody).digest("hex")}`; return crypto.timingSafeEqual(Buffer.from(sig), Buffer.from(expected)); } ``` ### Verifying in Python ```python import hmac, hashlib def verify_webhook(signature_header: str, raw_body: bytes, secret: str) -> bool: expected = "sha256=" + hmac.new(secret.encode(), raw_body, hashlib.sha256).hexdigest() return hmac.compare_digest(signature_header, expected) ``` ## Retry policy If your webhook returns non-2xx, we retry up to 5 times with backoff (1min → 2min → 5min → 15min → 1h), then stop. Check `webhook_delivery_status` and `webhook_last_error` on the batch row. ## Billing Each item is billed independently per the normal rules. If 800 of 1,000 items succeed, you pay $0.40 (800 × $0.0005). Failed items cost nothing. ## Limits - 1,000 items per batch - No limit on concurrent batches per account - Worker processes ~100 items / minute globally - sustained throughput for a single batch is ~6,000 items/hour ## Related - [Endpoint reference](/docs/endpoint-reference) - which endpoints batch can call - [Product endpoint](/docs/endpoints/product) - [Search endpoint](/docs/endpoints/search) - [Universal Web Scraper API](/docs/endpoints/universal) - [Billing policy](/docs/guides/billing) --- # Authentication _Source: https://chocodata.com/docs/guides/authentication_ # Authentication Every request to `https://api.chocodata.com` must include your API key as an `api_key` query parameter: ```http GET /api/v1/walmart/product?api_key=cd_live_YOUR_KEY_HERE&query=5085206428 ``` ```bash curl "https://api.chocodata.com/api/v1/walmart/product?api_key=cd_live_YOUR_KEY_HERE&query=5085206428" ``` Requests without a valid key return `401 Unauthorized`. Unauthorized requests are free - they never touch your credit balance. ## Key format All keys start with a prefix so you can tell them apart at a glance: | Prefix | Meaning | |---|---| | `cd_live_` | Production traffic. This is what you want 99% of the time. | | `cd_test_` | Sandbox keys (roadmap). Return deterministic fixtures, never hit a live target, never bill. | The 12 characters after the prefix (e.g. `cd_live_Tq0x4k9Pd8Mn`) form the shortprefix displayed in the dashboard and in our logs - enough to identify a key without revealing the full secret. ## Where to get a key Sign up at [app.chocodata.com](https://app.chocodata.com) (Google sign-in or email + password; no card required for the free tier). Go to **Settings > API keys** and click **Generate new key**. The full key is shown exactly once at creation time - save it to your secrets manager right then. ## Rotating keys 1. Generate a second key in the dashboard. 2. Deploy the new key to your app. 3. Revoke the old key. Revocation takes effect within ~30 seconds globally. Revoked keys return `401 Unauthorized` with `{"error":"revoked"}`. ## Multiple keys, workspace-wide You can have up to 20 active keys per workspace. Usage is aggregated across all of them - your plan's monthly credit allowance is per workspace, not per key. The **Usage** tab lets you filter by key so you can tell which service burned through what. ## Key hygiene (important) - **Never embed a key in client-side code** - browsers, mobile apps, public repos are all one-way streets. Proxy requests through your own server. - **Store keys in a secrets manager.** Not in `.env` files checked into git. Not in Slack screenshots. - **Use a separate key per environment** (`cd_live_...` for prod, another for staging). When a key leaks, you revoke only the affected environment. - **Rotate on a schedule** - quarterly is a reasonable cadence. If a key ever appears in logs, git history, or a paste bin, rotate immediately. ## IP allow-lists (roadmap) Restricting a key to a list of egress IPs will be available on Pro and Custom plans. Write to if you need it sooner - we can enable it manually. ## What happens to usage when a key is revoked Historical usage for a revoked key stays in your dashboard for 90 days under **Usage**. The key itself can no longer authenticate, so no new charges are possible against it. ## Related - [Getting started](/docs/getting-started) - make your first request - [Core concepts](/docs/core-concepts) - how the API works - [Error codes](/docs/guides/errors) - full list of `401` reasons - [Rate limits & concurrency](/docs/guides/rate-limits) - per-key ceilings --- # Error codes & retry semantics _Source: https://chocodata.com/docs/guides/errors_ # Error codes Every non-2xx response is free. That's the contract - if we didn't give you a clean product, you didn't pay for it. All errors return a JSON body with the same shape: ```json { "error": "string_code", "message": "human-readable explanation", "docs_url": "https://chocodata.com/docs/guides/errors#string_code", "request_id": "req_9f2c8a1b3e4d" } ``` The `request_id` is what we'll ask for if you email support. `docs_url` deep-links to the row in this table. ## Error table | HTTP | `error` code | When it happens | Retry? | |---|---|---|---| | `400` | `invalid_params` | Query identifier malformed, unsupported `domain` value, `language` not in the enum, missing `query` or `url`. | No - fix the request. | | `401` | `unauthorized` | Missing or malformed `api_key`, or unknown key. | No - get a valid key. | | `401` | `revoked` | Key was rotated out. | No - use your current key. | | `402` | `insufficient_credits` | You're out of credits and have no active auto-top-up. | Yes, after topping up. | | `404` | `not_found` | You hit an unknown path. Check the endpoint URL. | No. | | `404` | `item_not_found` | The target site returned 404 for the requested item: it's delisted, the identifier is malformed, or it never existed in this region. The response includes `retryable: false`. | No - drop the item from your set, or try a different region. | | `408` | `upstream_timeout` | The target site took > 45s to respond. | **Yes** - retry up to 3 times with 2s backoff. | | `422` | `blocked_by_target` | The target site served a challenge page or 503. We already retried internally. | Yes, but with a 60s+ delay. | | `429` | `rate_limited` | You exceeded your per-key RPS or concurrency ceiling. Check the `Retry-After` header. | **Yes** - respect `Retry-After`. | | `500` | `internal_error` | Unhandled exception on our side. Always get logged. | Yes, after ~5s. | | `501` | `not_implemented` | You passed a roadmap param (`render_js`, `screenshot`). | No - wait for the feature. | | `502` | `target_unreachable` | The target site blocked every internal retry. | Yes, with a 60s+ delay. | | `502` | `extraction_failed` | The target site served markup that our extractor couldn't parse. Usually a new layout; we auto-file a ticket. | Yes, typically works on retry. | | `502` | `id_mismatch` | You asked for item A, the site redirected to item B (usually a swap to a newer version of the item). | No - the source identifier is what's stale. | | `502` | `placeholder_page` | The target served a generic placeholder instead of the requested item (the item is delisted). | No - remove the item from your set. | | `503` | `capacity` | We're rate-limiting you to protect the shared pool. Rare. | Yes, after `Retry-After`. | ## Retry semantics - the defaults we recommend Use **exponential backoff with jitter** for anything marked "Yes" above. Here's a canonical loop you can port to any language: ```javascript async function scrapeWithRetry(fetchFn, maxAttempts = 4) { for (let attempt = 1; attempt <= maxAttempts; attempt++) { const res = await fetchFn(); if (res.ok) return res.json(); // Don't retry permanent errors. (402 insufficient_credits is also // terminal until you top up - handle it separately if you auto-refill.) if ([400, 401, 404, 501].includes(res.status)) { throw new Error(`permanent: ${res.status}`); } if (attempt === maxAttempts) throw new Error(`gave up after ${attempt}`); // Respect our Retry-After if we sent one. const retryAfter = Number(res.headers.get("Retry-After")) * 1000; const backoff = retryAfter || Math.min(2 ** attempt * 1000, 30_000); await new Promise(r => setTimeout(r, backoff + Math.random() * 1000)); } } ``` Our official SDKs (Node, Python, Go, CLI) apply this policy automatically. You only need to bake it in yourself if you're calling the raw HTTP API. ## Debugging a failed request Every response carries `request_id` in the body. If you email with that ID, we can pull the full internal trace - upstream status, attempts, latency per attempt, extractor output. Keep it in your own logs for at least 7 days so we don't both have to guess later. ## Related - [Core concepts](/docs/core-concepts) - the response and error envelope - [Authentication](/docs/guides/authentication) - key format + rotation - [Rate limits & concurrency](/docs/guides/rate-limits) - per-plan ceilings - [Billing policy](/docs/guides/billing) - only-2xx rule in detail --- # Rate limits & concurrency _Source: https://chocodata.com/docs/guides/rate-limits_ # Rate limits & concurrency Two ceilings apply to every API key: 1. **Concurrency** - how many requests can be in-flight at the same moment. 2. **Sustained throughput** - how many requests per second, averaged over a short window, you can send before we start returning `429`. Both are generous enough that most customers never hit them, and both scale with your plan. ## Per-plan ceilings | Plan | Concurrent in-flight | Sustained RPS | Burst ceiling | |---|---|---|---| | Free ($0/mo) | 10 | 2 | 10 for 3s | | Vibe ($19/mo) | 30 | 10 | 40 for 3s | | Pro ($49/mo) | 50 | 25 | 60 for 3s | | Custom ($100-$2,000/mo) | 100 - 500+ | 50 - 500 | 2x sustained for 3s | "Burst ceiling" means you can briefly exceed the sustained RPS by that much before we start pushing back. Once the 3s window closes, you're back to the sustained figure. ## What you see when you hit a limit A `429` response with a standard `Retry-After` header: ```http HTTP/1.1 429 Too Many Requests Retry-After: 2 Content-Type: application/json { "error": "rate_limited", "message": "Exceeded sustained RPS (25) for key cd_live_Tq0x...", "docs_url": "https://chocodata.com/docs/guides/rate-limits", "request_id": "req_9f2c..." } ``` **Always respect `Retry-After`.** All of our official SDKs (Node / Python / Go / CLI) do this automatically. If you're calling raw HTTP, the pattern is: ```javascript if (res.status === 429) { await sleep(Number(res.headers.get("Retry-After") || 1) * 1000); return retry(); } ``` ## Checking your current utilisation Every successful response carries two informational headers: - `Asa-Concurrency: 3/20` - you have 3 requests in-flight out of 20 allowed. - `Asa-Rps: 8/10` - you're sending 8 RPS on average in the current window against a 10 RPS ceiling. Plot these in your own observability stack and you'll see headroom before you run out of it. ## Concurrency, explained If your ceiling is 30, you can have exactly 30 requests open to our API at once. Open a 31st and it queues client-side (in our SDKs) or gets a `429` immediately (raw HTTP). Long-running requests - for example a Batch poll that takes 8 minutes - consume a concurrency slot the whole time. Rule of thumb: **Concurrency = Target RPS x Average request duration (seconds)**. At 10 RPS with a 3 s average, you need 30 concurrent slots. That's exactly what the Vibe plan ships, and Pro's 50 covers a 17 RPS sustained workload. ## Batch endpoint Batch submissions themselves are cheap - one `POST /api/v1/{site}/batch` with 1,000 items consumes one concurrency slot for the duration of the POST (a few seconds). The *items* inside the batch are processed by our worker under a separate internal concurrency budget that doesn't count against yours. ## How to get more - **Upgrade.** Each paid tier bumps both ceilings by 2-4x. - **Custom plan.** If you're already on Pro and hitting 50 concurrent, the Custom tier ladder runs from 100 to 500+ concurrent, up to $2,000/mo. - **Short-term boosts.** Email with the numbers you need and how long. If it fits the pool budget, we'll flip it on without requiring a plan change. ## Preventing bursts in your own code Even when you have the headroom, a polite client is a resilient client. The Node SDK exposes a built-in token-bucket limiter: ```javascript import { Chocodata } from "chocodata"; const chocodata = new Chocodata("cd_live_…", { maxConcurrency: 30, // defaults to your plan's ceiling if omitted maxRps: 10, }); // Now you can fire-and-forget 10,000 items; the SDK paces them. const results = await Promise.all( ids.map(id => chocodata.product({ site: "walmart", query: id })) ); ``` Equivalent code in Python, Go, and the CLI ships the same limiter. ## Related - [Authentication](/docs/guides/authentication) - [Error codes](/docs/guides/errors) - [SDK quick-starts](/docs/guides/sdks) --- # SDKs, CLI & MCP _Source: https://chocodata.com/docs/guides/sdks_ # SDKs, CLI & MCP We publish official SDKs for the languages most customers use, plus a CLI and an MCP server. Each wraps the HTTP API with typed helpers, built-in retry/backoff on 408/429/5xx, and a token-bucket rate limiter that respects your plan's ceiling. Pick the one that matches your stack and skip the raw `fetch` code. | Surface | Package | Install | |---|---|---| | Node.js / TypeScript | [`chocodata`](https://www.npmjs.com/package/chocodata) | `npm install chocodata` | | Python | [`chocodata`](https://pypi.org/project/chocodata/) | `pip install chocodata` | | Go | [`github.com/Chocodata-com/chocodata-go`](https://pkg.go.dev/github.com/Chocodata-com/chocodata-go) | `go get github.com/Chocodata-com/chocodata-go` | | CLI (any language) | [`chocodata-cli`](https://www.npmjs.com/package/chocodata-cli) | `npm install -g chocodata-cli` | | MCP server (AI agents) | [`chocodata-mcp`](https://www.npmjs.com/package/chocodata-mcp) | `npx chocodata-mcp` | ## Node / TypeScript ```bash npm install chocodata ``` ```typescript import { Chocodata } from "chocodata"; const chocodata = new Chocodata("cd_live_YOUR_KEY"); const product = await chocodata.product({ site: "walmart", query: "5085206428", }); console.log(product.title, product.price); ``` Typed responses, auto-pagination on Search, webhook-signature verification on Batch. Works in Node 18+ and any ESM/CJS setup. See the [npm README](https://www.npmjs.com/package/chocodata) for the full reference. ## Python ```bash pip install chocodata ``` ```python from chocodata import Chocodata chocodata = Chocodata(api_key="cd_live_YOUR_KEY") product = chocodata.product(site="walmart", query="5085206428") print(product["title"], product["price"]) ``` Works on Python 3.9+. Sync and async variants ship side by side (`chocodata.product` vs `await chocodata.product_async`). Type hints via `TypedDict`. ## Go ```bash go get github.com/Chocodata-com/chocodata-go ``` ```go package main import ( "context" "fmt" "log" chocodata "github.com/Chocodata-com/chocodata-go" ) func main() { client := chocodata.New("cd_live_YOUR_KEY") data, err := client.Product(context.Background(), chocodata.ProductParams{ Site: "walmart", Query: "5085206428", }) if err != nil { log.Fatal(err) } fmt.Println(data.Title, data.Price) } ``` Context-aware cancellation, idiomatic Go error wrapping, zero external deps beyond `net/http`. ## CLI The CLI is handy for quick one-off scrapes from a terminal or shell script. ```bash npm install -g chocodata-cli chocodata login cd_live_YOUR_KEY # stored in ~/.chocodata/config # Single product chocodata walmart product --query 5085206428 # Search chocodata google search --query "laptop" --pages 2 # Pipe a list of IDs from a file cat ids.txt | chocodata walmart product --json-per-line > out.ndjson ``` The CLI outputs JSON by default, or NDJSON (one row per line) with `--json-per-line` - easy to pipe into `jq`, `awk`, or a warehouse loader. ## MCP server (Claude, Cursor, and other agents) The MCP server exposes Chocodata as tools that an AI agent can call directly, so a coding agent or chat client can scrape the web without you writing glue code. It runs over stdio and works with any MCP-compatible client (Claude Desktop, Cursor, Windsurf, and others). Add it to your client's MCP config: ```json { "mcpServers": { "chocodata": { "command": "npx", "args": ["-y", "chocodata-mcp"], "env": { "CHOCODATA_API_KEY": "cd_live_YOUR_KEY" } } } } ``` The agent then has tools for the dedicated endpoints and the [Universal Web Scraper API](/docs/endpoints/universal): point it at a site and resource, or hand it a URL, and it returns structured JSON. Pricing and the only-2xx billing rule are identical to the HTTP API. ## Raw HTTP (no SDK) For stacks we don't publish an SDK for yet (Java, Ruby, PHP, C#), the API is a plain `GET` with your key in the `?api_key=` query parameter. Every endpoint page has a `cURL` snippet you can port. Examples: - [Product endpoint](/docs/endpoints/product) - [Search endpoint](/docs/endpoints/search) - [Universal Web Scraper API](/docs/endpoints/universal) - [Async Batch endpoint](/docs/endpoints/batch) ## What the SDKs do that raw HTTP doesn't - **Exponential backoff with jitter** on 408 / 429 / 5xx. - **Client-side rate limiter** so you don't have to think about your plan's ceiling. - **Webhook-signature verification** (Batch endpoint) in one line. - **Typed response shapes** - autocomplete every product field in your editor. - **Sensible defaults** - `Accept-Encoding: gzip`, `User-Agent` identifying the SDK + version, a 30s request timeout you can override per call. If you ever find the SDK doing something unexpected, pass `debug: true` at construction and you'll get every HTTP call printed to stderr with timings. ## Related - [Authentication](/docs/guides/authentication) - API key format and rotation - [Rate limits & concurrency](/docs/guides/rate-limits) - what the built-in limiter is protecting you from - [Error codes](/docs/guides/errors) - what the SDK's retry policy is reacting to --- # Country, region, and content language _Source: https://chocodata.com/docs/guides/country-and-language_ # Country, region, and content language Three distinct concepts - don't conflate them: | Concept | What you control | Example | |---|---|---| | **Region (TLD)** | `domain` param | `com` → the US store, `de` → the German store | | **Content language** | `language` param | `en_US`, `de_DE`, `es_MX` | | **Proxy egress country** | automatic, not exposed | German store → DE residential IP | ## Proxy egress is automatic You don't configure proxy egress. When you hit `?domain=de` we automatically route through a residential IP in Germany. When you hit `?domain=co.jp` we route through Japan. This matching reduces the target site's anti-bot score and gives you the real desktop-locale HTML. There was briefly a `country` parameter that let customers force a specific egress IP. We removed it - obvious mismatches (e.g. scraping a German store through a US IP) trigger anti-bot defenses disproportionately and add complexity without customer value. ## Content language override Some sites serve a region in multiple languages. Pass `?language=xx_YY` to override: ```http GET /api/v1/{site}/product?query=PRODUCT_ID&domain=com&language=es_US ``` This maps to the site's `?language=es_US` URL parameter. The site then serves the Spanish US-region content: prices in USD, shipping from US warehouses, but titles, descriptions, and bullets translated to Spanish. ## Supported languages per region (a worked example) Multi-region sites follow the same `domain` + `language` model, though supported values vary by site. The table below uses one large multi-region marketplace's storefronts as a concrete, empirically tested example of how regions map to default and alternate languages. Treat it as an illustration of the pattern, not a global list. | Marketplace | Default | Other languages that work | |---|---|---| | `com` (US) | `en_US` | `de_DE`, `es_ES`, `es_US`, `zh_CN`, `pt_BR`, `ar_AE`, `en_GB` | | `co.uk` (UK) | `en_GB` | `de_DE`, `fr_FR`, `pl_PL`, `ro_RO`, `en_US` | | `de` (Germany) | `de_DE` | `en_GB`, `en_US`, `nl_NL`, `pl_PL`, `tr_TR`, `cs_CZ`, `fr_FR` | | `fr` (France) | `fr_FR` | `en_GB`, `de_DE`, `nl_NL`, `ar_AE` | | `it` (Italy) | `it_IT` | `en_GB`, `en_US`, `de_DE` | | `es` (Spain) | `es_ES` | `en_GB`, `en_US`, `pt_BR`, `pt_PT`, `ca_ES` | | `nl` (Netherlands) | `nl_NL` | `en_GB`, `de_DE` | | `pl` (Poland) | `pl_PL` | (monolingual) | | `se` (Sweden) | `sv_SE` | `en_GB` | | `ca` (Canada) | `en_CA` | `en_US`, `en_GB`, `fr_CA`, `fr_FR`, `zh_CN` | | `com.mx` (Mexico) | `es_MX` | `es_ES`, `en_US` | | `com.br` (Brazil) | `pt_BR` | `en_US` | | `com.au` (Australia) | `en_AU` | `en_US`, `en_GB`, `zh_CN` | | `co.jp` (Japan) | `ja_JP` | `en_US`, `en_GB`, `zh_CN`, `ko_KR` | | `sg` (Singapore) | `en_SG` | `en_US`, `en_GB`, `zh_CN` | | `in` (India) | `en_IN` | `en_US`, `hi_IN` | | `com.tr` (Turkey) | `tr_TR` | `en_US`, `en_GB` | | `ae` (UAE) | `en_AE` | `ar_AE`, `en_US` | | `sa` (Saudi Arabia) | `ar_SA` | `en_US`, `en_GB` | | `eg` (Egypt) | `ar_EG` | `en_US`, `en_GB` | If you pass a language that the region doesn't support, the site silently falls back to the region's default. No error from us. ## What doesn't change with `language` The `language` parameter only affects **presentation** (titles, descriptions, bullets, UI text). It doesn't affect: - **Price** - always shown in the region's native currency - **Availability / stock** - physical availability is per-region, not per-language - **Shipping** - ships from the region - **Reviews** - a site may show reviews from multiple sources; the language flag doesn't filter them If you need Spanish-language product data *from* the Mexican store, use `domain=com.mx` (which defaults to `es_MX`). If you want Spanish-language data *from* the US store (available to US-Spanish speakers), use `domain=com&language=es_US`. ## Related - [Core concepts](/docs/core-concepts) - [Product endpoint](/docs/endpoints/product) - [Search endpoint](/docs/endpoints/search) --- # Scraping Social Media & App Stores _Source: https://chocodata.com/docs/guides/social-and-app-stores_ # Scraping Social Media & App Stores Chocodata now covers the social and app-store surfaces alongside the rest of the web. The same single GET you already use for e-commerce and search returns structured JSON for apps, videos, posts, profiles, and job listings: ```http GET https://api.chocodata.com/api/v1/{target}/{resource}?api_key=YOUR_KEY&... ``` No login, no session juggling, no per-platform SDK. Pass the identifier (an app ID, a video ID, a username, a post URL) and get back typed fields. ## What you can pull | Platform | `{target}` | Resources | |---|---|---| | Apple App Store | `appstore` | `search`, `product`, `reviews` | | Google Play | `googleplay` | `product`, `search` | | YouTube | `youtube` | `video`, `channel`, `comments`, `transcript` | | Reddit | `reddit` | `subreddit`, `post`, `user`, `search` | | TikTok | `tiktok` | `video`, `profile`, `oembed` | | X (Twitter) | `xtwitter` | `tweet` | | Instagram | `instagram` | `profile`, `post` | | Facebook | `facebook` | `page`, `post` | | LinkedIn | `linkedin` | `jobsearch`, `job`, `company` | A few highlights: - **App stores, both sides.** Search a storefront, pull a single app's full listing (price, version, ratings, genres, screenshots), and page through user reviews on the App Store, plus the Google Play equivalent. - **YouTube, including transcripts and comments.** Beyond video metadata and channel uploads, `youtube.transcript` returns the timed caption track as timestamped segments and one joined string, and `youtube.comments` returns top or newest comments with author, like count, and reply count. - **Reddit, with scores.** Posts and comments carry their `score`, `upvote_ratio`, and `num_comments`, so you can rank and filter by engagement straight from the response. - **TikTok, X, Instagram, Facebook, LinkedIn.** Video and post detail with engagement stats, public profiles and company pages, and LinkedIn job search plus single-job detail. ## A first call ```bash curl "https://api.chocodata.com/api/v1/youtube/transcript?api_key=YOUR_KEY&video_id=dQw4w9WgXcQ" ``` ```json { "video_id": "dQw4w9WgXcQ", "language": "en", "is_generated": false, "segment_count": 61, "segments": [ { "text": "♪ We're no strangers to love ♪", "start": 18640, "duration": 3240 } ], "text": "♪ We're no strangers to love ♪ ♪ You know the rules and so do I ♪…" } ``` The shape is the same one every Chocodata endpoint follows, so if you've used the [Product](/docs/endpoints/product) or [Search](/docs/endpoints/search) endpoints, these will feel familiar. ## Per-endpoint reference The full request shape, every parameter (with allowed enum values), and a real example response for each of these endpoints live in the endpoint reference: - [App Stores](/docs/endpoint-reference#app-stores) - `appstore` and `googleplay`. - [Social Media](/docs/endpoint-reference#social-media) - `reddit`, `youtube`, `tiktok`, `xtwitter`, `instagram`, `facebook`, and `linkedin`. ## Related - [Endpoint reference](/docs/endpoint-reference) - the resource model and every endpoint's request/response. - [Product endpoint](/docs/endpoints/product) - the specific-item pattern these follow. - [Search endpoint](/docs/endpoints/search) - the list pattern. - [Getting started](/docs/getting-started) - your first scrape in two minutes. --- # Billing policy _Source: https://chocodata.com/docs/guides/billing_ # Billing policy **Only successful (HTTP 2xx) requests cost credits.** Everything else is free. ## The rule set | Outcome | Bill? | |---|---| | Auth failure (401) | ❌ Free | | Bad params (400) | ❌ Free | | Rate limit from us (429) | ❌ Free | | Roadmap param (501 `render_js`/`screenshot`) | ❌ Free | | Target-site timeout or network failure | ❌ Free | | Target 503 / challenge page detected | ❌ Free | | Target 302 to login | ❌ Free | | Target returned 404 (404 `item_not_found`, item delisted) | ❌ Free | | Target served something, extractor found nothing (502 `extraction_failed`) | ❌ Free | | ID mismatch - requested item A, got item B (502 `id_mismatch`) | ❌ Free | | Partial result (core fields present, some optional fields missing) | ✅ Billed | | Full result | ✅ Billed | ## Why so generous? Because you shouldn't pay for our failures. When a target site blocks a request, we absorb the infrastructure cost and return a clean error to you. You get what you paid for or you pay nothing. Our current success rate is ~97 %, and the cost of the failed 3 % is a rounding error on our margins. The simpler rule - "only 2xx costs credits" - is worth more to us in customer trust than the pennies we'd save by billing borderline cases. ## Credit cost per request Any single request (a dedicated endpoint or the [Universal Web Scraper API](/docs/endpoints/universal)) costs **5 credits**. Two opt-in add-ons are reserved for a future release: JavaScript rendering adds **+10 credits** (15 total) and a screenshot adds **+10 credits**. The credit unit is just the internal accounting unit: customer-facing pricing is quoted in requests, at **$0.90 per 1,000 successful requests** on PAYG, and cheaper on subscriptions. ## The one edge case worth flagging If a result returns valid fields but some optional ones are missing (e.g. `price: null` for an item with multiple format editions, `variations: []` for a single-variant product), you're billed. The response IS the item - nothing is missing because the extractor failed; it's missing because the site didn't publish it. A successful 2xx with real data costs 5 credits. ## Retries Our orchestrator handles retries internally - you send one request, we do the work of getting it through. **You're billed at most once per request**, only if the final result is a successful 2xx. The number of internal attempts used is surfaced in the `Asa-Attempts` response header for your own debugging. ## Batch billing Each item in a batch is billed independently. 800/1000 successes at 5 credits each = 4,000 credits charged. Failures in the batch are free. The failure reason is in each item's `error` field. ## Checking your usage Every request, successful or not, appears in your dashboard at `/app/usage`. Failed requests display `$0` for Cost and a clear reason code. Use them to debug, not worry. ## Versioning Any change to this billing policy is announced 14 days in advance via email and reflected in both this page and the [pricing FAQ](https://chocodata.com/pricing#faq) simultaneously. In-flight requests are settled under the old policy. ## Related - [Core concepts](/docs/core-concepts) - the credit model - [Error codes](/docs/guides/errors) - every non-2xx reason, all free - [Pricing](https://chocodata.com/pricing) - plan limits and PAYG