Tools and applications

Full stack applications

These applications are probably the reason why you ended up here. Most of the smaller packages below are part of their full stack.

OpenAleph – Search through large documents and structured data
Aleph – Original open-source core project, will no longer be maintained after October 2025
Aleph Pro – Closed-source SaaS version of original Aleph project, launching October 2025

Build data and datasets

Tools and frameworks for creating FollowTheMoney data with scrapers or custom applications.

followthemoney – core ontology and data validation system, includes CSV/SQL to FtM mapper.
memorious – light-weight web scraping toolkit for scrapers that collect structured or un-structured data
- A more recent fork of memorious
zavod – Data processing framework as part of OpenSanctions
investigraph – Framework to create FollowTheMoney data
ingest-file – Create document graphs out of source data for Aleph applications

Specialised data importers:

followthemoney-ocds - Convert open contracting data standard files to FtM
followthemoney-cellebrite - Import data forensics dumps from Cellebrite
Importers for BODS (Beneficial Ownership Data) and GLEIF RR files are in OpenSanctions.

Clean data

Tools and frameworks for cleaning and validating FollowTheMoney data.

rigour – Data cleaning and validation functions for processing various types of text emanating and describing the business world, base to followthemoney.
prefixdate – a helper class to parse dates with varied degrees of precision
datapatch – A Python library for defining rule-based overrides on messy data
normality – a Python micro-package that contains a small set of text normalization functions for easier re-use
countrytagger – extract country name references from text
followthemoney-typepredict - guess the FtM type class of a piece of text, including distinguishing company and person names.

Analyze data

Tools and frameworks for analyzing FollowTheMoney data, for example transcribing Audio and Video entities, detecting languages or Named Entity Extraction (NER).

ftm-analyze – The standalone ftm analyzer formerly included in ingest-file for all kinds of processing
ftm-geocode – Batch parse and geocode addresses from FollowTheMoney entities
ftm-transcribe – Extract text from Video and Audio
followthemoney-compare – pre-process and train models to power a cross-reference system for FollowTheMoney data, includes a model based on regression and word frequency analysis in names.
juditha – Compare and resolve NER results to actual known FtM Entities
ingest-file.analysis – Part of the document ingestion is a comprehensive analysis phase used for Aleph applications

Store entity data

Tools and applications for storing and retrieving FollowTheMoney data such as databases, key-value stores or document archives. Contains as well tools for storing related data (such as images for Entities).

followthemoney-store – Sql-backed store for Entity fragments
nomenklatura – Store entity data as statements.
- Implementations for different graph-traversable backends (memory, redis, kvrocks, sql).
- Various entity matching algorithms (rule- and regression-based), and an in-memory cross-referencing index for data deduplication.
- A Wikidata client with mappings from their data model onto FtM statements (wants to become followthemoney-wikidata at some point)
- Data enrichment clients for building out investigative graphs pulling in remote info from Aleph, yente, Wikidata, OpenCorporates, PermID, OpenFIGI.
ftmq – More advanced querying logic on top off the nomenklatura store implementations
bahamut – WIP FollowTheMoney statement data server with built-in entity resolution support. Written in Java.
FollowTheMoney Data Lake – Scalable storage for structured data and document archives (upcoming)
ftm-columnstore – Clickhouse-backed implementation of a nomenklatura statement store
servicelayer – Document archive for legacy Aleph and OpenAleph
leakrfc – data standard and archive storage for leaked data, private and public document collections, will become ftm-datalake (see above)
ftm-assets – Assets (image) resolver and storage for FollowTheMoney data

IO / Streaming

Tools and helpers for streaming FollowTheMoney data between stores and systems.

alephclient – Getting data in and out of Aleph with its API
openaleph-client – alephclient fork for OpenAleph, adds more pre-processing capabilities.
ftmq.io – Generic helpers for read and write FollowTheMoney data from and to various local and remote locations

API / Search

Building blocks for serving and searching FollowTheMoney datasets for web applications.

yente – API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API specification.
ftmq-api – Expose statement stores (by ftmq / nomenklatura) to a read-only FastAPI
ftmq-search – Search experiments for FollowTheMoney data with different backends (Sqlite FTS, tantivy, elasticsearch)

Discontinued / legacy tools

These libraries have been discontinued or merged with others:

Aleph Data Desktop – desktop application for drawing investigative network diagrams.
pantomime – parsing and normalisation of internet MIME types in Python (discontinued, now in rigour.mime)
fingerprints – Name handling utilities for person and organisation names (discontinued, now in rigour.names)
languagecodes – normalise the ISO 639 codes used to describe languages from two-letter codes to three letters, and vice versa (discontinued, now in rigour.langs)
countrynames – This library helps with the mapping of country names to their respective two or three letter codes (now in rigour.territories)
addressformatting – address formatter that can format addresses in multiple formats that are common in different countries (discontinued, now in rigour.addresses)
followthemoney-predict - previous entity comparison/linkage codebase.