How Storytell processes spreadsheets
Our processing engine converts raw spreadsheet data into SQL databases that can handle millions of records and complex analytical capabilities.
Written By Mark Ku
Last updated 4 months ago
When you upload any spreadsheet-type of files to Storytell, our advanced processing pipeline transforms your raw data into AI-readable insights.
Whenever you reference at least one spreadsheet-type of file in your prompt, Storytell will automatically kick off our spreadsheet processing pipeline that transforms your raw data into AI-readable insights.
Learn more about how to work with your data: How to manage scope
Core capabilities
Spreadsheet processing
Process datasets of up to 1 million rows
Real-world examples:
E-commerce: customer transaction records
Healthcare: Clinical trial data with patient interactions
Manufacturing: Quality control logs
Finance: Daily trading data with millions of transactions per file
Cross-tab analysis
Seamlessly work with multi-sheet Excel files as if they're a unified database.
Example scenario:
A retail chain's inventory system with:
Products tab: SKUs, descriptions, categories, pricing
Suppliers tab: Vendor information, lead times, contact details
Orders tab: Purchase history, quantities, delivery dates
Stores tab: Location data, capacity, regional groupings
An example prompt you could do with this is:
Which suppliers had the highest delivery delays for electronics products in Q3, and how did this impact store inventory levels in the Northwest region?Storytell automatically generates complex JOINs across all tabs to provide comprehensive analysis.
How it works
Stage 1: File analysis and classification
Storytell employs AI-powered analysis to understand your data's structure and optimize it for intelligent querying.
Intelligent structure detection
Our processing engine uses large language models to perform comprehensive analysis of your spreadsheet files:
Deep content analysis
AI examines the file structure, identifying header patterns, data types, and organizational logic across all rows and columns
ExampleExample: A marketing report with headers like "Q1 Revenue", "Q2 Revenue" gets
recognized as quarterly financial data, while "Customer_ID", "First_Name",
"Last_Name" is identified as customer demographic information.Metadata extraction
The system automatically identifies and preserves contextual information such as:
Temporal data (date ranges, reporting periods)
Geographic identifiers (locations, regions)
Source information and processing notes
Technical parameters and methodology details
ExampleExample: From a sales report header showing "Sales Data - West Region - Q3 2024 -
Preliminary Results", the system extracts:
• Geographic: "West Region"
• Temporal: "Q3 2024"
• Source: "Sales Data"
• Status: "Preliminary Results"Column intelligence
Advanced inference determines the purpose and content of each data column, generating clean, descriptive names suitable for querying
ExampleExample: A column labeled "Cust ID" becomes "customer_id", "Rev $" becomes
"revenue_amount", and "# Orders" becomes "order_count" - making them much
more queryable.Multi-header processing
Handles complex spreadsheets with multi-row headers by combining information from all relevant rows to create complete column descriptions
ExampleExample: A financial report with:
Row 1: "2024 Performance Metrics"
Row 2: "Revenue | Expenses | Profit"
Row 3: "USD | USD | USD"
Gets processed as: "2024_performance_revenue_usd",
"2024_performance_expenses_usd", "2024_performance_profit_usd"Excel multi-sheet analysis
For Excel files with multiple worksheets, Storytell's AI:
Cross-sheet relationship detection
Identifies logical connections and foreign key relationships between different sheets
ExampleExample: In an e-commerce workbook:
• "Customers" sheet has a "customer_id" column
• "Orders" sheet has a "customer_id" column
• System automatically links these as a foreign key relationshipUnified schema creation
Builds database relationships automatically across all sheets, enabling complex cross-sheet queries
ExampleExample: A retail inventory system with sheets for "Products", "Suppliers",
"Orders", and "Stores" gets unified so you can ask: "Which suppliers had
delivery delays that affected inventory levels in our Northwest stores?"Smart data integration
Links related tables based on naming conventions and content analysis (e.g., "customers" and "customer_orders")
ExampleExample: Sheets named "employee_data" and "employee_performance" are
automatically recognized as related, even if the linking column is "emp_id"
in one sheet and "employee_number" in another.Stage 2: Database creation and optimization
Each structured file becomes a dedicated, high-performance database instance optimized for analytical queries.
Database architecture:
Storage: Columnar format for faster aggregations
Memory management: Intelligent caching based on query patterns
Indexing: Automatic index creation on frequently queried columns
Example:
ExampleFile: customer_transactions.csv (2.1M rows, 47 columns)
Processing time: ~45 seconds
Database creation:
✓ 15 automatic indexes on key columns
✓ Data type optimization (reduced storage by 40%)
✓ Constraint detection (identified 3 foreign key relationships)
✓ Query optimization statistics generated Handling complex Excel files:
ExampleFile: Financial_Model_2024.xlsx (12 tabs, 890K total rows)
Tab processing:
✓ "Income_Statement" → table with YoY calculations preserved
✓ "Balance_Sheet" → linked to Income_Statement via period_id
✓ "Cash_Flow" → integrated with both financial statements
✓ "Assumptions" → metadata table linked to all calculations
Cross-tab relationships: 23 automatic JOIN candidates identifiedStage 3: SQL generation engine
When you ask questions, Storytell's AI analyzes your intent and generates SQL queries tailored to your specific data structure.
Query intelligence capabilities:
Semantic understanding: Translates business terms to database columns
Context awareness: Uses file metadata and previous queries for better accuracy
Performance optimization: Automatically chooses efficient query patterns
Complex logic: Handles multi-step analytical reasoning
Real-world query examples:
Simple aggregation:
ExampleQuestion: "What's our average order value by region?"
Generated SQL:
SELECT
store_location,
ROUND(AVG(amount), 2) as avg_order_value,
COUNT(*) as total_orders
FROM customer_transactions
GROUP BY store_location
ORDER BY avg_order_value DESC;Complex multi-tab analysis:
ExampleQuestion: "Which product categories had declining margins in Q3, and what suppliers were involved?"
Generated SQL:
SELECT
p.category,
s.supplier_name,
ROUND(AVG(p.margin_percent), 2) as avg_margin,
ROUND(LAG(AVG(p.margin_percent)) OVER (
PARTITION BY p.category
ORDER BY DATE_TRUNC('quarter', o.order_date)
), 2) as prev_quarter_margin
FROM products p
JOIN orders o ON p.product_id = o.product_id
JOIN suppliers s ON o.supplier_id = s.supplier_id
WHERE DATE_TRUNC('quarter', o.order_date) = '2024-07-01'
GROUP BY p.category, s.supplier_name
HAVING avg_margin < prev_quarter_margin
ORDER BY (prev_quarter_margin - avg_margin) DESC;
Window functions and advanced analytics:
ExampleQuestion: "Show me the top 3 customers by revenue in each region, with their growth trends"
Generated SQL:
WITH customer_metrics AS (
SELECT
customer_id,
store_location as region,
SUM(amount) as total_revenue,
COUNT(*) as order_count,
ROUND(
(SUM(amount) - LAG(SUM(amount)) OVER (
PARTITION BY customer_id
ORDER BY DATE_TRUNC('quarter', transaction_date)
)) / LAG(SUM(amount)) OVER (
PARTITION BY customer_id
ORDER BY DATE_TRUNC('quarter', transaction_date)
) * 100, 2
) as growth_percent
FROM customer_transactions
GROUP BY customer_id, region
),
ranked_customers AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY region
ORDER BY total_revenue DESC
) as revenue_rank
FROM customer_metrics
)
SELECT * FROM ranked_customers WHERE revenue_rank <= 3;
Stage 4: Results processing and presentation
Storytell converts SQL results into conversational insights while maintaining data precision and enabling follow-up analysis.
Results enhancement:
Contextual formatting: Automatically formats currencies, percentages, dates
Statistical insights: Adds trend analysis and comparative metrics
Visual preparation: Structures data optimally for chart generation
Citation tracking: Maintains links between insights and source data
Example output transformation:
ExampleRaw SQL Result:
store_location | avg_order_value | total_orders
Northeast | 847.32 | 15,432
Southeast | 623.19 | 22,108
West Coast | 934.67 | 18,944
Storytell Enhancement:
"Your regional performance shows significant variation:
**West Coast leads in average order value** at $934.67 per transaction across 18,944 orders, generating approximately $17.7M in total revenue.
**Northeast shows premium positioning** with $847.32 average orders (15,432 transactions), suggesting a high-value customer base.
**Southeast demonstrates volume strength** with 22,108 orders despite lower $623.19 average values, indicating broad market penetration.
The 50% difference between highest and lowest average order values suggests opportunities for pricing optimization or targeted product positioning."
Technical capabilities and scale
Processing capabilities:
Maximum rows: 1 million
Maximum columns: unlimited
Processing time: ~5-10 minutes per million rows
Complex data structure handling
Multi-header row processing:
ExampleExample: Survey data with nested categories
Row 1: [blank] | "Demographics" | [blank] | "Preferences" | [blank]
Row 2: [blank] | "Age" | "Income" | "Product" | "Service"
Row 3: "ID" | "Years" | "USD" | "Rating" | "Rating"
Storytell creates:
- Column: participant_id (from "ID")
- Column: age_years (from "Demographics > Age > Years")
- Column: income_usd (from "Demographics > Income > USD")
- Column: product_rating (from "Preferences > Product > Rating")
- Column: service_rating (from "Preferences > Service > Rating")
Embedded metadata preservation:
ExampleExample: Financial report with calculation context
Data includes: "Revenue figures exclude extraordinary items (see note 3)"
Storytell preserves: Context available for AI to reference in explanations
Query result: "Q3 revenue was $2.3M (excluding extraordinary items per note 3)"
Multi-tab Excel processing
Relationship detection algorithms:
Primary key identification: Detects unique identifiers across tabs
Foreign key mapping: Links related data between sheets automatically
Naming pattern analysis: Connects "customer" tab to "customer_orders" tab
Cell reference parsing: Follows Excel formulas to understand dependencies
Cross-tab query capabilities:
ExampleScenario: HR database with 8 tabs
- employees (2,400 rows): Basic employee information
- salaries (14,000 rows): Historical compensation data
- departments (45 rows): Organizational structure
- projects (890 rows): Project assignments and timelines
- reviews (7,200 rows): Performance evaluation history
- training (12,300 rows): Skills development records
- benefits (3,100 rows): Healthcare and retirement data
- locations (23 rows): Office and remote work data
Complex query: "Which remote employees in the engineering department had salary increases above 8% last year while also completing AI-related training?"
Automatically generates 5-table JOIN across multiple tabs.Pro-tips for working with Excel/CSV Files
Optimizing Files for Best Performance
For best results, structure your files as databases:
✅ DO:
Use consistent headers
Keep each row as a complete record
Maintain consistent data types per column
Use descriptive column names (avoid "Column1", "Data", etc.)
Place lookup tables in separate tabs with clear relationships
Include date columns in YYYY-MM-DD format when possible
Multi-tab Excel optimization:
Place lookup tables (categories, locations) in separate tabs
Use consistent column naming across related tabs
Include header rows in every tab
Avoid merged cells in data regions (headers are OK for simple cases)
Limitations and considerations
Data structure requirements
Files that work best with the database engine:
Clean tabular format: Clear headers followed by consistent data rows
Database-like structure: Each row represents a record, each column a field
Consistent columns: Same data types within each column
Multiple related tables: Excel files with related tabs that can be JOINed
Complex data handling
Files that may require alternative processing:
Survey research exports: Complex multi-question layouts with varying column structures
Financial reports: Heavily formatted reports with merged cells and embedded calculations
Pivot table exports: Grouped data with subtotals and multiple aggregation levels
Dashboard layouts: Mixed content with charts, summaries, and data tables
Research reports: Files containing "dozens of different datasets" in single sheets
Real example of complex structure:
ExampleMQ18B_Lr1: You mentioned you Started using a brand because it was popular...
Total Gen Z Non Gen Z Teen Z 13-17 Young Adult Z 18-23
* Life-stage 11% 8% 15% 6% 6%
* Professional/academic 7% 8% 5% 8% 9%
* Identity-based 4% 5% 3% 2% 6%
This type of complex survey report contains multiple questions with different column structures. For such files, Storytell has a fallback mechanism that attempts to extract meaningful information as text chunks, though this results in fidelity loss compared to structured processing and may miss relationships between data points.
File format constraints
Excel limitations: 30 tab maximum (Excel's practical limit for performance)
Encoding issues: Non-UTF-8 CSV files require automatic conversion
Formula preservation: Excel formulas become calculated values (not dynamic)
Conditional formatting: Visual formatting is lost, data values preserved