Skip to content

Engineering · 2025

Local paper-research assistant

The motivation is to get high-order academic analysis out of a local, self-hosted model rather than a hosted API, while keeping everything inside a personal Obsidian knowledge graph. The target domain is the intersection of large language models and just-in-time adaptive health interventions, deliberately spanning both biomedical literature through PubMed and AI and computer-science work through arXiv. The aim is explicitly not summarization: the value is in critical analysis, designing research directions from papers' stated future work, finding gaps across multiple papers, and generating new research ideas.

The system is a pipeline-and-graph hybrid. A single model harness is the only entry point to the local inference server, so every model call is funneled through one place; the configured model is a local Gemma, with a smaller fallback. A natural-language question goes through an intent parser that can ask clarifying questions, then a search strategy that hits API sources first (PubMed, arXiv, Semantic Scholar) and falls back to browser automation for Google Scholar and page content. A snowball searcher walks citations across multiple depths with relevance filtering. The analysis pipeline runs in stages with context compaction between them: structural analysis per section, critical analysis, future-work direction setting on single papers, then gap analysis and ideation across multiple papers, ending in synthesis. Results are persisted three ways at once: a SQLite database of metadata and analyses, a graph of paper relationships, and Obsidian markdown with wikilinks and frontmatter.

What sets it apart is the layer built on top to make a local model trustworthy for unattended work. A session system tracks long-running research topics and short work sessions. An autonomous daemon detects idle time, generates its own research tasks, executes them through the existing orchestrator, and runs results through a verification pipeline. That verification layer is concrete code: a grounding checker that extracts key phrases from each claim and measures how many actually appear in the source text against a minimum ratio, plus self-consistency checking, drift detection, and an audit-document writer. This directly addresses the central risk of letting a model read and synthesize literature on its own, namely confident but unsupported claims.

On honest context, this is a personal research tool rather than a hardened product. It depends on a running local inference server and a populated Obsidian vault, and some ingestion paths (Google Scholar in particular) rely on browser scraping that is inherently fragile against site changes and rate limits. The grounding heuristic is a lexical phrase-overlap check rather than semantic entailment, so it catches blatant fabrication better than subtle misreadings. The test suite is broad, covering ingestion, the pipeline, verification, sessions, and export, but quality ultimately tracks the capability of the local model behind the harness.

← Back to all work