Enhanced on 03/16/2026

Kensho Extract

Kensho Extract now offers a containerized version to support on-premises workflows.

Read More

Kensho Extract is an advanced machine learning solution that structures documents and extracts tables, text and figures quickly and reliably at scale. With its document layout analysis and table-extraction capabilities, Kensho Extract accurately organizes a document’s headers, titles, paragraphs, tables, and footers, so users can process and extract values from hundreds of pages within seconds. By making unstructured PDFs machine-readable, the solution makes it easier for business and financial professionals to use these documents in downstream applications.

With Extract, you can:

  • Implement Extract as a PDF parser to enhance your R.A.G pipeline for downstream GenAI use cases
  • Interpret messy page layouts, structuring text into cohesive paragraphs that can then be effectively analyzed and searched
  • Augment your human workforce with easy to use document extraction tools, including a browser-accessible user interface

Service Provider Information

Kensho is S&P Global’s hub for AI innovation and transformation. Kensho's mission is to help S&P Global leverage cutting-edge tech to become the world’s most trusted and innovative data, benchmarks, and ratings company.

Key Information

Use Cases

  • GenAI: Kensho and S&P Global continue to collaborate on several GenAI initiatives, incorporating Kensho Extract in document pre-processing and standardization. For example, Kensho Extract plays a key role in applications such as ChatIQ for Market Intelligence and ChatAI for Energy.
  • Textual Datasets: Kensho Extract powers S&P Global’s Machine Readable Broker Research dataset, processing millions of broker reports and structuring the text within these documents. Once the document is structured, the dataset is delivered to clients through a textual data feed that enables downstream Natural Language Processing (NLP) workflows, including sentiment analysis and name entity recognition.
  • Export Tabular Information at Scale: Find and identify any tables within static PDF documents and export them into user friendly formats such as JSON, Excel or CSV

Benefits

  • Tabular Extraction Model Flexibility: Unlike other specific-use tabular extraction tools that rely more heavily on “hard-coded” rule-based logic, Kensho Extract’s machine learning (ML) model allows for high performance over a much broader range of document table types
  • Analyzes documents quickly and accurately with industry leading processing times
  • Extracts rich, machine-readable insights for AI processing, analysis, and productivity enhancement
  • Containerized version offered for on-prem workflows

Details