80.1 % on LoCoMo Long-Term Memory Benchmark with a pure open-source RAG pipeline

3 points by ViktorKuz 6 hours ago

I just pushed the current SOTA on the LoCoMo long-term memory benchmark for agents: 80.1 % accuracy using only:

-BGE-large-en-v1.5 (1024d) + FAISS

-Custom “MCA” gravitational ranking (keyword coverage + importance + frequency)

-BM25 sparse retrieval

-Direct Cross-Encoder reranking (bge-reranker-v2-m3) on the full union (~120-150 docs)

-Gpt-4o-mini only for final answer generation and judging (everything else is open weights or classic)

Repo: https://github.com/vac-architector/VAC-Memory-System Key tricks that finally broke 80% :

-MCA-first filter (coverage ≥ 0.1 → top-30) — catches exact-keyword questions early

-Feeding the entire union straight into Cross-Encoder (112–135 documents) instead of pre-filtering

-Proper query instruction for BGE-large (the classic “Represent this sentence for searching relevant passages”)

The whole pipeline runs in < 3s per query on a single RTX 4090. LoCoMo is currently the hardest public long-term memory benchmark (5.880 real human–agent conversations, multi-hop, temporal, negation, etc.).

Beating Mem0 official baseline by ~12–14 pp with fully open components feels pretty good. Would love feedback, especially from people who are also grinding on agent memory systems.

My background: My path didn't start in an IT office, but in Columbus, Ohio, where I worked as a handyman after leaving my job on the cell towers. The decision came from necessity: I bought a powerful PC on installments and resolved to create something that would change my life.

I had no experience, but I had an idea. Using Claude CLI as my sole mentor, I focused on architecture, not syntax.

Over 4.5 months of work, I engineered and created the VAC Memory System. To prove its value, I tested it on the toughest RAG benchmark—LoCoMo. Today, my system shows an overall result of 80.1% and a phenomenal 87.78% in the "Commonsense" category.

This is more than just code; it is the result of faith in an idea. I showed that by using modern tools, it is possible to achieve SOTA-level performance and create serious technology, regardless of your starting point. I highly anticipate your feedback.