Tools and applications
Full stack applications
These applications are probably the reason why you ended up here. Most of the smaller packages below are part of their full stack.
- OpenAleph – Search through large documents and structured data
- Aleph – Original open-source core project, will no longer be maintained after October 2025
- Aleph Pro – Closed-source SaaS version of original Aleph project, launching October 2025
Build data and datasets
Tools and frameworks for creating FollowTheMoney data with scrapers or custom applications.
- followthemoney – core ontology and data validation system, includes CSV/SQL to FtM mapper.
- memorious – light-weight web scraping toolkit for scrapers that collect structured or un-structured data
- A more recent fork of memorious
- zavod – Data processing framework as part of OpenSanctions
- investigraph – Framework to create FollowTheMoney data
- ingest-file – Create document graphs out of source data for Aleph applications
Specialised data importers:
- followthemoney-ocds - Convert open contracting data standard files to FtM
- followthemoney-cellebrite - Import data forensics dumps from Cellebrite
- Importers for BODS (Beneficial Ownership Data) and GLEIF RR files are in OpenSanctions.
Clean data
Tools and frameworks for cleaning and validating FollowTheMoney data.
- rigour – Data cleaning and validation functions for processing various types of text emanating and describing the business world, base to
followthemoney
. - countrynames – This library helps with the mapping of country names to their respective two or three letter codes
- prefixdate – a helper class to parse dates with varied degrees of precision
- datapatch – A Python library for defining rule-based overrides on messy data
- normality – a Python micro-package that contains a small set of text normalization functions for easier re-use
- countrytagger – extract country name references from text
- followthemoney-typepredict - guess the FtM type class of a piece of text, including distinguishing company and person names.
Analyze data
Tools and frameworks for analyzing FollowTheMoney data, for example transcribing Audio and Video entities, detecting languages or Named Entity Extraction (NER).
- ftm-analyze – The standalone ftm analyzer formerly included in
ingest-file
for all kinds of processing - ftm-geocode – Batch parse and geocode addresses from FollowTheMoney entities
- ftm-transcribe – Extract text from Video and Audio
- followthemoney-compare – pre-process and train models to power a cross-reference system for FollowTheMoney data, includes a model based on regression and word frequency analysis in names.
- juditha – Compare and resolve NER results to actual known FtM Entities
- ingest-file.analysis – Part of the document ingestion is a comprehensive analysis phase used for Aleph applications
Store entity data
Tools and applications for storing and retrieving FollowTheMoney data such as databases, key-value stores or document archives. Contains as well tools for storing related data (such as images for Entities).
- followthemoney-store – Sql-backed store for Entity fragments
- nomenklatura – Store entity data as statements.
- Implementations for different graph-traversable backends (memory, redis, kvrocks, sql).
- Various entity matching algorithms (rule- and regression-based), and an in-memory cross-referencing index for data deduplication.
- A Wikidata client with mappings from their data model onto FtM statements (wants to become
followthemoney-wikidata
at some point) - Data enrichment clients for building out investigative graphs pulling in remote info from Aleph, yente, Wikidata, OpenCorporates, PermID, OpenFIGI.
- ftmq – More advanced querying logic on top off the
nomenklatura
store implementations - bahamut – WIP FollowTheMoney statement data server with built-in entity resolution support. Written in Java.
- FollowTheMoney Data Lake – Scalable storage for structured data and document archives (upcoming)
- ftm-columnstore – Clickhouse-backed implementation of a
nomenklatura
statement store - servicelayer – Document archive for legacy Aleph and OpenAleph
- leakrfc – data standard and archive storage for leaked data, private and public document collections, will become
ftm-datalake
(see above) - ftm-assets – Assets (image) resolver and storage for FollowTheMoney data
IO / Streaming
Tools and helpers for streaming FollowTheMoney data between stores and systems.
- alephclient – Getting data in and out of Aleph with its API
- openaleph-client –
alephclient
fork for OpenAleph, adds more pre-processing capabilities. - ftmq.io – Generic helpers for read and write FollowTheMoney data from and to various local and remote locations
API / Search
Building blocks for serving and searching FollowTheMoney datasets for web applications.
- yente – API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API specification.
- ftmq-api – Expose statement stores (by
ftmq
/nomenklatura
) to a read-only FastAPI - ftmq-search – Search experiments for FollowTheMoney data with different backends (Sqlite FTS, tantivy, elasticsearch)
Discontinued / legacy tools
These libraries have been discontinued or merged with others:
- Aleph Data Desktop – desktop application for drawing investigative network diagrams.
- pantomime – parsing and normalisation of internet MIME types in Python (discontinued, now in rigour.mime)
- fingerprints – Name handling utilities for person and organisation names (discontinued, now in rigour.names)
- languagecodes – normalise the ISO 639 codes used to describe languages from two-letter codes to three letters, and vice versa (discontinued, now in rigour.langs)
- addressformatting – address formatter that can format addresses in multiple formats that are common in different countries (discontinued, now in rigour.addresses)
- followthemoney-predict - previous entity comparison/linkage codebase.