Relevancite: your personal research validating AI assistant

Motivation

Academic research depends on accurate citations, yet verifying whether a referenced paper truly supports a claim can be tedious and error-prone. In large literature reviews, this task becomes nearly impossible to perform manually for every claim. RelevanCite was created to address this challenge, aiming to reduce errors, save researchers time, and strengthen the integrity of scholarly work.

Introduction

RelevanCite is an automated system that checks if citation claims in a manuscript are actually supported by the referenced research papers. Unlike other tools that rely on cloud services, RelevanCite runs completely offline, using local models to preserve privacy and ensure sensitive research data remains secure. All downloaded papers are stored locally, and all processing is done on the user’s machine. It aims to be a tool useful for researchers (authors), reviewers, and editors.

How It Works

The system processes manuscripts through a series of modular steps:

  1. Citation Extraction
    Structured metadata is extracted from documents using GROBID, including references, authors, and bibliographic information.

  2. Full-Text Retrieval
    Papers are automatically retrieved from open-access sources or manually uploaded and stored locally.

  3. Claim Verification
    Claims from the manuscript are compared with the relevant sections of cited papers using vector embeddings and similarity search.

  4. Structured Reporting
    Verification results and evidence passages are produced in JSON format, ensuring transparency and reproducibility.

System Design and Architecture

RelevanCite employs a modular and loosely coupled architecture, designed for flexibility, maintainability, and extensibility. Each component can operate independently, allowing:

  • Easy replacement or upgrades of embedding models, vector stores, or frontend interfaces.
  • Transparent data processing, with all intermediate artifacts stored as JSON.
  • Secure offline operation, ensuring sensitive research data is never uploaded to external servers.

Main Components

  1. Citation Extraction Module

    • Uses GROBID to parse manuscripts into structured citation metadata.
    • Produces bibliographic fields and reference lists as JSON.
  2. Document Retrieval Module

    • Fetches full-text papers from open-access sources like Unpaywall.
    • Supports manual PDF uploads and maintains a local PDF repository.
  3. Embedding and Similarity Engine

    • Converts claims and paper sections into vector embeddings.
    • Performs similarity search to identify relevant evidence.
    • Runs entirely on local GPU/CPU for security and privacy.
  4. Verification and Reporting

    • Compares claims against extracted evidence and produces structured JSON reports.
    • Ensures reproducibility and easy debugging.
  5. Watchdog and Service Layer

    • Monitors embedding and verification tasks.
    • Supports concurrent processing of multiple manuscripts.

Data Handling

  • Papers: Stored locally as PDFs.
  • Intermediate results: All structured data, embeddings, and verification results are JSON.
  • Reports: Detailed JSON reports indicating claim support and evidence sections.

Future Enhancements

RelevanCite is evolving into a full research assistant with features such as:

  • Support for custom local models to improve embeddings or verification logic.
  • Integration with additional open-access sources for wider coverage.
  • Citation recommendation features to suggest more relevant or higher-quality references.
  • CPU-only execution mode for systems without GPU, while retaining offline local operation.

The goal is to create a system that not only verifies citations but also actively assists researchers in producing well-supported, high-quality work.

This architecture ensures that RelevanCite remains flexible, secure, and easy to extend for future research support capabilities.

Project Demonstration

GitHub Repository: github.com/Aayushstha03/RelevanCite

Tags :
Share :

Related Posts

Narrative Reconstruction, Stitching Together the Full Story Using LLMs

If you follow news online, especially from sources like Online Khabar, you know the drill. A big story, a political investigation, a natural disaster, a major development project, doesn’t fit in one article. Instead, it unfolds over days, weeks or months in separate reports: “New Witness in Case,” “Committee Grills Official,” “Court Hearing Postponed.” As a reader, it’s frustrating. You’re left with puzzle pieces scattered across time, trying to remember who said what, when it happened, and how it all connects.

Read More

Automated Fact Checking using LLMs and SERP

In today’s digital age, misinformation spreads rapidly, making it crucial to verify facts quickly and accurately. Leveraging Large Language Models (LLMs) and Search Engine Results Pages (SERP) APIs, we can automate the fact-checking process. Here’s a step-by-step guide on how to implement this.

Read More

Dockerized FastAPI template with Celery and Redis for Asynchronous Task Management

While I was working on an application that required handling of long asynchronous tasks, I was talking to a mentor of mine who asked me how I was going to handle the user experience while those tasks were being processed in the background. I hadn’t really thought about it much, so he suggested that I look into using Celery with Redis as a message broker to manage those tasks asynchronously, allowing the main application to remain responsive.

Read More