Document Processing

When you upload a document to LOQI, it goes through a multi-step processing pipeline to make it searchable and citable by the AI.

Processing stages

Stage	What happens
Reading	Text is extracted from the file (PDF, Word, Excel, etc.)
Organizing	The text is split into meaningful chunks (paragraphs, sections)
Indexing	Each chunk is converted to a vector embedding for semantic search
Complete	The document is ready for AI use

A progress indicator shows the current stage and percentage.

Type	Extensions	Notes
PDF	.pdf	Includes scanned PDFs (processed via vision extraction)
Word	.docx	Full text and formatting preserved
Excel	.xlsx	Sheet content extracted as text
CSV	.csv	Tabular data extracted
Images	.png, .jpg	Text extracted via vision AI
Text	.txt	Direct text ingestion

Processing time depends on:

File size — Larger files take longer
File type — Scanned PDFs (image-based) take longer than text-based PDFs
Number of pages — A 10-page PDF processes in under a minute; a 200-page document may take several minutes

For scanned PDFs (where text cannot be extracted directly):

This takes longer but ensures even scanned documents are fully searchable.

If a document shows a failed status:

File size limits

The maximum upload size is 150 MB per file. For larger documents, split them into smaller parts before uploading.