Document Processing
When you upload a document to LOQI, it goes through a multi-step processing pipeline to make it searchable and citable by the AI.
Processing stages
| Stage | What happens |
|---|---|
| Reading | Text is extracted from the file (PDF, Word, Excel, etc.) |
| Organizing | The text is split into meaningful chunks (paragraphs, sections) |
| Indexing | Each chunk is converted to a vector embedding for semantic search |
| Complete | The document is ready for AI use |
A progress indicator shows the current stage and percentage.
Supported file types
| Type | Extensions | Notes |
|---|---|---|
| Includes scanned PDFs (processed via vision extraction) | ||
| Word | .docx | Full text and formatting preserved |
| Excel | .xlsx | Sheet content extracted as text |
| CSV | .csv | Tabular data extracted |
| Images | .png, .jpg | Text extracted via vision AI |
| Text | .txt | Direct text ingestion |
Processing time
Processing time depends on:
- File size — Larger files take longer
- File type — Scanned PDFs (image-based) take longer than text-based PDFs
- Number of pages — A 10-page PDF processes in under a minute; a 200-page document may take several minutes
Scanned documents
For scanned PDFs (where text cannot be extracted directly):
- LOQI uses vision AI to read each page
- Text is extracted from the images
- The document then goes through the normal chunking and indexing pipeline
This takes longer but ensures even scanned documents are fully searchable.
Troubleshooting
If a document shows a failed status:
- Check that the file is not corrupted or password-protected
- Try re-uploading the file
- Very large files (over 150 MB) may time out — try splitting them
File size limits
The maximum upload size is 150 MB per file. For larger documents, split them into smaller parts before uploading.