Storytell Vision Ingest
Storytell Vision Ingest interprets charts, graphs, images, and flags, transforming visual content into structured, query-ready insights for a comprehensive understanding of your documents.
Written By Mark Ku
Last updated 3 months ago
Overview
Vision Ingest is Storytell's advanced visual understanding capability that transforms charts, graphs, images, and other visual content into structured, query-ready insights. Instead of missing critical data locked inside images or visual elements, Vision Ingest ensures you get complete understanding of your documents—making every visual as searchable and analyzable as text.
Key benefits
Unlock data hidden in visual elements that traditional text processing misses
Get complete document understanding by combining text and visual analysis
Save time by automatically processing visual content into searchable formats
How it works
Vision Ingest uses a sophisticated multi-modal pipeline powered by Vision-capable Large Language Models (LLMs). Unlike basic Optical Character Recognition (OCR) that simply extracts text characters, Vision Ingest understands the meaning and context of your visual content.
The process transforms visual elements into structured markdown or HTML output, optimized for both human readers and AI querying.
Key capabilities
Universal Visual Understanding
Vision Ingest interprets multiple types of visual content including charts, graphs, tables, images, infographics, and even symbolic elements like flags. It identifies data trends, extracts key metrics, and describes visual relationships within your documents.
Use cases:
Extracting performance metrics from dashboard screenshots
Understanding data trends in business reports
Analyzing infographics and visual summaries
Intelligent File Processing
The system handles various file formats (PDF, DOCX, PPTX, PNG, JPEG, WEBP) and automatically splits large documents into manageable sections for optimal processing.
Use cases:
Processing mixed-format document collections
Handling large presentation files with embedded visuals
Converting scanned documents with visual elements
Smart Query Enhancement
Vision Ingest analyzes uploaded visual content and can suggest more precise prompts to help you get better answers about your visual data.
Use cases:
Getting guidance on how to query complex visual data
Improving question formulation for better results
Understanding what insights are available from visual content
Real-world examples
Business Dashboard Analysis
When you upload a sales report dashboard, you can ask specific questions like "What is the lead conversion ratio?" and Vision Ingest will analyze the visual elements to provide precise answers based on the charts and metrics displayed.

Example queries and responses:
If you ask: "What is the lead conversion ratio in Sales Report Dashboard Template?"

Similarly, if you inquire: "What is the lead to opportunity ratio?"

Infographic Interpretation
For complex visuals like charts showing languages with country flags, you can ask "Based on this chart, identify the most spoken language and list the countries associated with it," and Vision Ingest will process both the data and symbolic elements to provide comprehensive answers.

You could then ask about the languages and countries, and Storytell responds:

Visual Data Extraction
Transform presentations, reports, or image files containing charts and graphs into queryable insights without manual data entry or conversion.

Getting started
Upload documents containing visual elements through the standard file upload process
Ask questions about charts, graphs, or images within your documents
Use natural language to query visual content just as you would text content