Skip to content

Tools and applications

Full stack applications

These applications are probably the reason why you ended up here. Most of the smaller packages below are part of their full stack.

  • OpenAleph – Search through large documents and structured data
  • Aleph – Original open-source core project, will no longer be maintained after October 2025
  • Aleph Pro – Closed-source SaaS version of original Aleph project, launching October 2025

Build data and datasets

Tools and frameworks for creating FollowTheMoney data with scrapers or custom applications.

  • followthemoney – core ontology and data validation system, includes CSV/SQL to FtM mapper.
  • memorious – light-weight web scraping toolkit for scrapers that collect structured or un-structured data
  • zavod – Data processing framework as part of OpenSanctions
  • investigraph – Framework to create FollowTheMoney data
  • ingest-file – Create document graphs out of source data for Aleph applications

Specialised data importers:

  • followthemoney-ocds - Convert open contracting data standard files to FtM
  • followthemoney-cellebrite - Import data forensics dumps from Cellebrite
  • Importers for BODS (Beneficial Ownership Data) and GLEIF RR files are in OpenSanctions.

Clean data

Tools and frameworks for cleaning and validating FollowTheMoney data.

  • rigour – Data cleaning and validation functions for processing various types of text emanating and describing the business world, base to followthemoney.
  • countrynames – This library helps with the mapping of country names to their respective two or three letter codes
  • prefixdate – a helper class to parse dates with varied degrees of precision
  • datapatch – A Python library for defining rule-based overrides on messy data
  • normality – a Python micro-package that contains a small set of text normalization functions for easier re-use
  • countrytagger – extract country name references from text
  • followthemoney-typepredict - guess the FtM type class of a piece of text, including distinguishing company and person names.

Analyze data

Tools and frameworks for analyzing FollowTheMoney data, for example transcribing Audio and Video entities, detecting languages or Named Entity Extraction (NER).

  • ftm-analyze – The standalone ftm analyzer formerly included in ingest-file for all kinds of processing
  • ftm-geocode – Batch parse and geocode addresses from FollowTheMoney entities
  • ftm-transcribe – Extract text from Video and Audio
  • followthemoney-compare – pre-process and train models to power a cross-reference system for FollowTheMoney data, includes a model based on regression and word frequency analysis in names.
  • juditha – Compare and resolve NER results to actual known FtM Entities
  • ingest-file.analysis – Part of the document ingestion is a comprehensive analysis phase used for Aleph applications

Store entity data

Tools and applications for storing and retrieving FollowTheMoney data such as databases, key-value stores or document archives. Contains as well tools for storing related data (such as images for Entities).

  • followthemoney-store – Sql-backed store for Entity fragments
  • nomenklatura – Store entity data as statements.
    • Implementations for different graph-traversable backends (memory, redis, kvrocks, sql).
    • Various entity matching algorithms (rule- and regression-based), and an in-memory cross-referencing index for data deduplication.
    • A Wikidata client with mappings from their data model onto FtM statements (wants to become followthemoney-wikidata at some point)
    • Data enrichment clients for building out investigative graphs pulling in remote info from Aleph, yente, Wikidata, OpenCorporates, PermID, OpenFIGI.
  • ftmq – More advanced querying logic on top off the nomenklatura store implementations
  • bahamut – WIP FollowTheMoney statement data server with built-in entity resolution support. Written in Java.
  • FollowTheMoney Data Lake – Scalable storage for structured data and document archives (upcoming)
  • ftm-columnstoreClickhouse-backed implementation of a nomenklatura statement store
  • servicelayer – Document archive for legacy Aleph and OpenAleph
  • leakrfc – data standard and archive storage for leaked data, private and public document collections, will become ftm-datalake (see above)
  • ftm-assets – Assets (image) resolver and storage for FollowTheMoney data

IO / Streaming

Tools and helpers for streaming FollowTheMoney data between stores and systems.

  • alephclient – Getting data in and out of Aleph with its API
  • openaleph-clientalephclient fork for OpenAleph, adds more pre-processing capabilities.
  • ftmq.io – Generic helpers for read and write FollowTheMoney data from and to various local and remote locations

Building blocks for serving and searching FollowTheMoney datasets for web applications.

  • yente – API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API specification.
  • ftmq-api – Expose statement stores (by ftmq / nomenklatura) to a read-only FastAPI
  • ftmq-search – Search experiments for FollowTheMoney data with different backends (Sqlite FTS, tantivy, elasticsearch)

Discontinued / legacy tools

These libraries have been discontinued or merged with others: