Data Engineer's Obsession: Mapping the Epstein Network

When the Department of Justice released millions of pages of documents related to Jeffrey Epstein in early 2025, most people scrolled headlines and moved on. One data engineer saw something different: a raw dataset waiting to be understood.

What started as casual reading evolved into something far more consuming. The engineer, Max Andrews, began writing code to extract relationships from the documents. Names. Connections. Patterns. Each line of code pulled more threads, and each thread led somewhere new. The casual project became an obsession.

The result was a sophisticated network explorer that transformed chaos into clarity. The tool processes documents using AI to extract entities and relationships, then visualises them as an interactive graph. Rather than searching one name at a time, users could now see how people connected to each other, how money moved, how the network operated. A force-directed graph displayed the highest-density connections. Timeline views showed relationships evolving over time. Filtering options let researchers narrow down by content categories, from financial transactions to travel arrangements.

This approach to data organisation represents a particular kind of power. Traditional document searches rely on keywords and names. A researcher must know who to look for. But a network graph reveals structure itself, showing who appears alongside whom, which connections cluster together, where the gaps emerge.

The technical challenge was substantial. The Epstein files span millions of pages across a dozen datasets, containing emails, flight logs, financial records, and investigative summaries. Extracting structured data from unstructured text required AI assistance. The project uses Claude AI to identify entities, recognise relationships, and deduplicate entries where the same person appears under multiple name variations. It tags content across 30 categories and clusters those tags into semantic groups for easier filtering.

Andrews was not alone in this pursuit. Multiple independent researchers and projects emerged with similar goals. Some built their own network graphs using Neo4j, a specialist graph database. Others created searchable indexes with advanced filtering. One effort indexed 1.4 million documents and 2.8 million pages. Another researcher published analysis using community detection algorithms, finding five distinct social circles orbiting Epstein.

Yet the tools themselves illustrate a deeper problem: the Epstein files are so large, so complex, that ordinary reading becomes impractical. A journalist or researcher cannot reasonably read three million pages. Even the most sophisticated search tools can only surface what the documents explicitly contain. As network scientists have noted, the most sensitive relationships often leave the fewest traces. Secrecy by design creates gaps in any database.

The work also raises harder questions about what it means to map a hidden network. When researchers see someone mentioned frequently in documents, does that reveal their actual importance, or merely their documentary visibility? A person who leaves few written traces might hold more influence than someone whose communications are well-preserved. The structure of what we can observe shapes what we think we know.

For Andrews and others like him, the obsession served something genuine. These tools make a vast archive navigable. They let researchers, journalists, and citizens examine connections that would otherwise remain buried. They democratise access to information that powerful institutions controlled for decades.

But there is a cost to that kind of focus, a personal toll that projects like this can exact. Building the definitive database means living with it, seeing connections everywhere, unable to unsee the web once you have mapped it. The pursuit of transparency can consume the pursuer entirely.