
Return to Core
PythonTF-IDFFastAPISQLiteSearch
InsightIndex Search Engine
Index PDFs, HTML, and code locally—TF-IDF ranking and a dashboard, no hosted search tax.
Problem
- Finding information across personal docs + code shouldn’t require a hosted search stack.
- You want relevance ranking and observability, not just keyword grep.
Build
- Local crawler/indexer building an inverted index in SQLite.
- TF‑IDF ranking + FastAPI endpoints + dashboard for stats and highlighted results.
Outcome
- Fast local search over mixed file types with explainable ranking signals.
- A dashboard to understand index coverage and query behavior.
The thread
InsightIndex is a personal search engine that crawls your local files—code, documents, web pages, and PDFs—and converts them into a structured inverted index stored in SQLite.
The engine implements a pure mathematical TF-IDF (Term Frequency–Inverse Document Frequency) ranking algorithm to surface the most relevant documents and snippets for each query.
On top of the core engine, a FastAPI backend and a dark glassmorphism dashboard visualize index statistics, TF-IDF rankings, and search results with highlighted context.
Architecture Overview
- //Python indexing pipeline building an SQLite-based inverted index
- //TF-IDF scoring for ranking documents and snippets
- //FastAPI + Uvicorn backend exposing JSON search endpoints
- //Vanilla JS/HTML/CSS dashboard with analytics and charts