Storytell Vision Ingest

Storytell Vision Ingest interprets charts, graphs, images, and flags, transforming visual content into structured, query-ready insights for a comprehensive understanding of your documents.

Written By Mark Ku

Last updated 4 months ago

Overview

Vision Ingest is Storytell's advanced visual understanding capability that transforms charts, graphs, images, and other visual content into structured, query-ready insights. Instead of missing critical data locked inside images or visual elements, Vision Ingest ensures you get complete understanding of your documents—making every visual as searchable and analyzable as text.

Key benefits

Unlock data hidden in visual elements that traditional text processing misses
Get complete document understanding by combining text and visual analysis
Save time by automatically processing visual content into searchable formats

How it works

Vision Ingest uses a sophisticated multi-modal pipeline powered by Vision-capable Large Language Models (LLMs). Unlike basic Optical Character Recognition (OCR) that simply extracts text characters, Vision Ingest understands the meaning and context of your visual content.

The process transforms visual elements into structured markdown or HTML output, optimized for both human readers and AI querying.

Key capabilities

Universal Visual Understanding

Vision Ingest interprets multiple types of visual content including charts, graphs, tables, images, infographics, and even symbolic elements like flags. It identifies data trends, extracts key metrics, and describes visual relationships within your documents.

Use cases:

Extracting performance metrics from dashboard screenshots
Understanding data trends in business reports
Analyzing infographics and visual summaries

Intelligent File Processing

The system handles various file formats (PDF, DOCX, PPTX, PNG, JPEG, WEBP) and automatically splits large documents into manageable sections for optimal processing.

Use cases:

Processing mixed-format document collections
Handling large presentation files with embedded visuals
Converting scanned documents with visual elements

Smart Query Enhancement

Vision Ingest analyzes uploaded visual content and can suggest more precise prompts to help you get better answers about your visual data.

Use cases:

Getting guidance on how to query complex visual data
Improving question formulation for better results
Understanding what insights are available from visual content

Real-world examples

Business Dashboard Analysis

When you upload a sales report dashboard, you can ask specific questions like "What is the lead conversion ratio?" and Vision Ingest will analyze the visual elements to provide precise answers based on the charts and metrics displayed.

Example queries and responses:

If you ask: "What is the lead conversion ratio in Sales Report Dashboard Template?"

Similarly, if you inquire: "What is the lead to opportunity ratio?"

Infographic Interpretation

For complex visuals like charts showing languages with country flags, you can ask "Based on this chart, identify the most spoken language and list the countries associated with it," and Vision Ingest will process both the data and symbolic elements to provide comprehensive answers.

You could then ask about the languages and countries, and Storytell responds:

Visual Data Extraction

Transform presentations, reports, or image files containing charts and graphs into queryable insights without manual data entry or conversion.

Getting started

Upload documents containing visual elements through the standard file upload process
Ask questions about charts, graphs, or images within your documents
Use natural language to query visual content just as you would text content