Skip to content

Property and PropertyType

followthemoney.property.Property

A definition of a value-holding field on a schema. Properties define the field type and other possible constraints. They also serve as entity to entity references.

description property

A longer description of the semantics of this property.

label property

User-facing title for this property.

caption(value)

Return a user-friendly caption for the given value.

generate(model)

Setup method used when loading the model in order to build out the reverse links of the property.

specificity(value)

Return a measure of how precise the given value is.

to_dict()

Return property metadata in a serializable form.

validate(data)

Validate that the data should be stored.

Since the types system doesn't really have validation, this currently tries to normalize the value to see if it passes strict parsing.

followthemoney.types.common.PropertyType

Bases: object

Base class for all FtM property types.

Every property defined on a schema has a type attribute that points to a PropertyType instance. The type is responsible for cleaning incoming values, validating them, comparing them against other values of the same type, and producing display labels and graph node IDs.

Concrete types (NameType, DateType, CountryType, etc.) are instantiated once at module load and exposed as singletons on the registry. Application code should access them by name — registry.name, registry.date, registry.country — rather than instantiating them directly.

group = None class-attribute instance-attribute

Groups are used to invert all the properties of an entity that have a given type into a single list before indexing them. This way, in Aleph, you can query for countries:gb instead of having to make a set of filters like properties.jurisdiction:gb OR properties.country:gb OR ....

label = 'Any' class-attribute instance-attribute

A name for this type to be shown to users.

matchable = True class-attribute instance-attribute

Matchable types allow properties to be compared with each other in order to assess entity similarity. While it makes sense to compare names, countries or phone numbers, the same isn't true for raw JSON blobs or descriptive text snippets.

max_length = 250 class-attribute instance-attribute

The maximum length of a single value of this type. This is used to warn when adding individual values that may be malformed or too long to be stored in downstream databases with fixed column lengths. The unit is unicode codepoints (not bytes), the output of Python len().

name = const('any') class-attribute instance-attribute

A machine-facing, variable safe name for the given type.

pivot = False class-attribute instance-attribute

Pivot property types are like a stronger form of :attr:~matchable types: they will be used when value-based lookups are used to find commonalities between entities. For example, pivot typed-properties are used to show all the other entities that mention the same phone number, email address or name as the one currently seen by the user.

plural = 'Any' class-attribute instance-attribute

A plural name for this type which can be used in appropriate places in a user interface.

total_size = None class-attribute instance-attribute

Some types have overall size limitations in place in order to avoid generating entities that are very large (upstream ElasticSearch has a 100MB document limit). Once the total size of all properties of this type has exceed the given limit, an entity will refuse to add further values.

caption(value, format=None)

Return a label for the given property value. This is often the same as the value, but for types like countries or languages, it would return the label, while other values like phone numbers can be formatted to be nicer to read.

clean(raw, fuzzy=False, format=None, proxy=None)

Convert a raw value into its canonical form for storage on an entity.

Returns None if the value is empty or cannot be interpreted as this type. The fuzzy flag loosens validation for types that support it (dates, identifiers). format supplies a type-specific hint — for example, a strptime format string for dates. proxy is the entity the value is being added to, which some types use for context-aware cleaning (address normalization can use the entity's country, for example).

This method converts the input to a string, drops null-equivalents, and then delegates to clean_text. Subclasses normally override clean_text rather than this method.

clean_text(text, fuzzy=False, format=None, proxy=None)

Type-specific cleaning hook.

Override this in subclasses to normalize a non-null string value into the type's canonical representation. Return None to reject the value. The base implementation is a pass-through. clean() calls this after stringifying the input and filtering nulls.

compare(left, right)

Score the similarity of two values of this type.

Returns a float in [0.0, 1.0]: 0.0 means the values carry no evidence of matching, 1.0 means they are identical in the strongest type-specific sense. Intermediate values quantify partial similarity — for names, the Levenshtein ratio; for countries, territory overlap; for dates, precision-aware proximity.

The base implementation does a lowercase equality check weighted by specificity(), so a match on a longer, more specific value scores higher than a match on a short one. Subclasses override this for richer comparisons.

Values are assumed to be cleaned (output of clean()) but not further normalized — compare is the right place to apply type-specific normalization before matching.

compare_safe(left, right)

Variant of compare() that accepts None on either side.

Returns 0.0 if either argument is missing. Otherwise delegates to compare().

compare_sets(left, right, func=max)

Score the similarity of two value sets by reducing pairwise comparisons.

Every element of left is compared to every element of right, and the resulting scores are reduced with funcmax by default, so the best pairwise match wins. Returns 0.0 if either set is empty. Pass func=sum or a statistical mean for alternative aggregation strategies.

country_hint(value)

Determine if the given value allows us to infer a country that it may be related to (e.g. using a country prefix on a phone number or IBAN).

join(values)

Render multiple values of this type as a single string.

Used when flattening multi-valued properties into formats that allow only one value per cell (CSV, some RDF serializations). Values are joined with ;. The transformation is not reversible — use only at the final serialization step.

node_id(value)

Build a graph node ID for a typed property value.

Used by graph exporters (Cypher, GEXF, Neo4J bulk) when reifying property values into their own graph nodes — for example, turning every phone number mentioned by any entity into a single node connected to the entities that carry it. The default encoding is {type}:{value}, matching the RDF URN form.

node_id_safe(value)

Wrapper for node_id to handle None values.

pick(values)

Choose the best representative value from a set of alternatives.

Used when a UI needs to display a single value for a multi-valued property, or when reducing a set of similar values to a canonical form (for example, picking the most complete variant of a name). Subclasses that support picking — notably NameType — implement type-specific heuristics. The base implementation raises NotImplementedError.

specificity(value)

Return a score for how specific the given value is. This can be used as a weighting factor in entity comparisons in order to rate matching property values by how specific they are. For example: a longer address is considered to be more specific than a short one, a full date more specific than just a year number, etc.

to_dict()

Return a serialisable description of this data type.

validate(value, fuzzy=False, format=None)

Returns a boolean to indicate if the given value is a valid instance of the type.