PDF Parsing API Guide

Basic Information

Base URL

Use the following base URL for all API requests:

https://www.kolmopdf.com

General Notes

  1. Network access: Connect to the API directly. Outside mainland China, temporary network instability may interrupt uploads.
  2. Data retention: After you receive the result from the status endpoint, download it as soon as possible. Result files are kept on the server for 7 days only.
    • Image URL retention: When images_as_url=true is enabled, generated image URLs are cached for 30 days and will be cleaned up after expiration.
  3. File limits:
    • Maximum file size: 300 MB
    • Maximum page count per PDF: 800 pages

Authentication

Getting an API Key

You can now apply for and manage API keys directly on the KolmoPDF website from the PDF Parsing API page.

  1. Advanced plan members: May create 1 API key. API usage consumes the same point balance as the web account.
  2. Team plan members: May create up to 10 API keys. Each key can have its own usage limit, which makes it easier to allocate quotas to subprojects or partners.

Point limit logic

The actual usable balance for a key is the lower of:

  • your account's remaining total balance
  • the remaining limit configured for that key

Request Headers

Include either a Bearer token in the Authorization header or send the key with X-API-Key.

NameExampleDescription
AuthorizationBearer sk-xxxReplace sk-xxx with your real API key.
X-API-Keysk-xxxReplace sk-xxx with your real API key.

You can also pass the key as a URL parameter:

ParameterExampleDescription
api_keysk-xxxReplace sk-xxx with your real API key.

Async Processing Flow

KolmoPDF uses a three-step async workflow:

upload and parse -> poll status -> download result

1. Upload and Parse

POST /api/pdf-to-markdown-proxy/parse

Upload a PDF file and create a parsing task.

Request Parameters

NameLocationTypeRequiredDescription
fileFormDatafileYesPDF file in binary format.
table_modeFormDatastringNoTable output mode. markdown converts tables into Markdown tables. image keeps them as images. Default: markdown.
enable_translationFormDatastringNoWhether translation is enabled. true or false. Default: false.
images_as_urlFormDatastringNoWhether image references should be returned as public URLs. true or false. Default: false. When true, the final output is a Markdown file instead of a ZIP archive.
target_languageFormDatastringNoTarget language code. Only valid when enable_translation=true. Supported: zh, en, ja, ko, fr, de, es, ru. Default: zh.
output_optionsFormDatastringNoTranslation output mode. Available: original, translated, bilingual. Multiple values can be joined with commas. Default: original.
textFormDatastringNoPure text mode. true or false. Default: false. Useful when you only need text transcription and want a simpler prompt.
skip_rotation_detectionFormDatastringNoSkip auto-rotation detection. true or false. Default: false.
enable_cross_page_mergeFormDatastringNoEnable smart cross-page table merging for up to three consecutive pages. true or false. Default: false.

Request Examples

Windows (CMD / PowerShell):

cmdClick to Copy
curl -X POST "https://www.kolmopdf.com/api/pdf-to-markdown-proxy/parse?api_key=sk-xxx" ^
  -F "file=@document.pdf" ^
  -F "table_mode=markdown" ^
  -F "enable_translation=false" ^
  -F "text=false" ^
  -F "skip_rotation_detection=false" ^
  -F "enable_cross_page_merge=true"

Linux / macOS:

bashClick to Copy
curl -X POST 'https://www.kolmopdf.com/api/pdf-to-markdown-proxy/parse?api_key=sk-xxx' \
  -F "file=@document.pdf" \
  -F "table_mode=markdown" \
  -F "enable_translation=false" \
  -F "text=false" \
  -F "skip_rotation_detection=false" \
  -F "enable_cross_page_merge=true"

Success Example: Processing Started

jsonClick to Copy
{
  "success": true,
  "task_id": "12345",
  "status": "processing",
  "message": "Task created successfully",
  "points_deducted": 20,
  "remaining_points": 80
}

Success Example: Waiting in Queue

jsonClick to Copy
{
  "success": true,
  "task_id": "12345",
  "status": "waiting",
  "message": "Task queued and waiting for processing",
  "points_deducted": 20,
  "remaining_points": 80,
  "queue_info": {
    "position": 1,
    "ahead_tasks": 3
  }
}

Failure Example: Insufficient Points

jsonClick to Copy
{
  "success": false,
  "message": "Insufficient points",
  "error_code": "insufficient_points",
  "points_required": 20,
  "current_points": 15
}

Failure Example: Invalid File Type

jsonClick to Copy
{
  "success": false,
  "message": "File is not a PDF file",
  "error_code": "parse_file_not_pdf"
}

Failure Example: File Too Large

jsonClick to Copy
{
  "success": false,
  "message": "File size exceeds limit (300MB)",
  "error_code": "parse_file_too_large",
  "file_size": 314572800,
  "max_size": 314572800
}

Failure Example: Page Limit Exceeded

jsonClick to Copy
{
  "success": false,
  "message": "Page count exceeds limit (800 pages)",
  "error_code": "parse_page_limit_exceeded",
  "page_count": 1000,
  "max_pages": 800
}

Point Consumption Rules

ServiceCost
Parsing only2 points per page
Parsing + translation3 points per page

2. Check Task Status

GET /api/pdf-to-markdown-proxy/status/{task_id}

Poll the task status. A polling interval of 1 to 3 seconds is recommended.

Success Example

jsonClick to Copy
{
  "success": true,
  "status": "completed",
  "message": "Processing completed",
  "result": {
    "task_id": "01920000-0000-0000-0000-000000000000",
    "download_url": "/api/pdf-to-markdown-proxy/download/01920000-0000-0000-0000-000000000000"
  }
}

Processing Example

jsonClick to Copy
{
  "success": true,
  "status": "processing",
  "message": "Processing"
}

Waiting Example

jsonClick to Copy
{
  "success": true,
  "status": "waiting",
  "message": "Waiting in queue (3 tasks ahead)",
  "queue_info": {
    "position": 1,
    "ahead_tasks": 3
  }
}

Failure Example

jsonClick to Copy
{
  "success": false,
  "status": "failed",
  "message": "Parsing error",
  "error_code": "parse_error"
}

If a task fails after it was created successfully, deducted points are automatically refunded.


3. Download Result

GET /api/pdf-to-markdown-proxy/download/{task_id}

Download the finished result file.

By default, successful requests return a ZIP archive containing the Markdown file and related assets. If the original parse request used images_as_url=true, this endpoint returns a Markdown file instead, and image references point to public URLs.


Check Point Balance

GET /api/pdf-to-markdown-proxy/balance

Returns the current point balance for the API key.

Success Example

jsonClick to Copy
{
  "success": true,
  "points": 98,
  "api_key": "sk-xxxx..."
}

Invalid Key Example

jsonClick to Copy
{
  "success": false,
  "message": "Invalid API key"
}

Error Codes

HTTP Status Codes

StatusMeaningDescription
401UnauthorizedAPI key is missing or invalid.
402Insufficient pointsNot enough balance to complete the operation.
429Rate limit / queue limitToo many active tasks are already running for this key.
500Server errorInternal server error. See the response body for details.

Business Error Codes

Error CodeMeaningSuggested ActionPoints Deducted
invalid_api_keyAPI key is invalid or does not exist.Check the key and try again.No
insufficient_pointsNot enough available points.Add more points to the account.No
no_file_foundNo file was included in the request.Make sure FormData contains the file field.No
parse_file_too_largeFile size exceeds the limit.Split the PDF into smaller parts.No
parse_page_limit_exceededPDF page count exceeds the limit.Split the PDF into smaller parts.No
parse_file_not_pdfUploaded file is not a PDF.Upload a valid .pdf file.No
file_upload_failedUpload to storage failed.Check the network and retry.No
points_deduction_failedPoint deduction failed.Contact support.No
task_creation_failedTask creation failed.Contact support. Refunded automatically if needed.Refunded
parse_errorParsing failed.Retry later. If the issue persists, contact support.Refunded
parse_file_invalidThe PDF is malformed or invalid.Try a different export or re-save the PDF.Refunded
parse_timeoutProcessing timed out.Split the PDF and try again.Refunded

Parameter Notes

table_mode

ValueMeaning
markdownConvert tables into editable Markdown tables.
imageKeep tables as images to preserve layout.

target_language

Supported language codes:

CodeLanguage
zhChinese
enEnglish
jaJapanese
koKorean
frFrench
deGerman
esSpanish
ruRussian

output_options

ValueMeaning
originalOutput the original Markdown only.
translatedOutput the translated Markdown only.
bilingualOutput a bilingual version with original and translation.

text

ValueMeaning
falseDefault. Full recognition mode with tables, formulas, code blocks, and other complex structures.
truePure text mode. Focuses only on text transcription.

  1. Upload the PDF with /parse
  2. Poll /status/{task_id} until the task is complete
  3. Download the final file with /download/{task_id}
  4. Save the result locally before the retention window expires