Property and PropertyType
followthemoney.property.Property
A definition of a value-holding field on a schema. Properties define the field type and other possible constraints. They also serve as entity to entity references.
description
property
A longer description of the semantics of this property.
label
property
User-facing title for this property.
caption(value)
Return a user-friendly caption for the given value.
generate(model)
Setup method used when loading the model in order to build out the reverse links of the property.
specificity(value)
Return a measure of how precise the given value is.
to_dict()
Return property metadata in a serializable form.
validate(data)
Validate that the data should be stored.
Since the types system doesn't really have validation, this currently tries to normalize the value to see if it passes strict parsing.
followthemoney.types.common.PropertyType
Bases: object
Base class for all FtM property types.
Every property defined on a schema has a type attribute that points to a
PropertyType instance. The type is responsible for cleaning incoming values,
validating them, comparing them against other values of the same type, and
producing display labels and graph node IDs.
Concrete types (NameType, DateType, CountryType, etc.) are instantiated
once at module load and exposed as singletons on the registry. Application
code should access them by name — registry.name, registry.date,
registry.country — rather than instantiating them directly.
group = None
class-attribute
instance-attribute
Groups are used to invert all the properties of an entity that have a
given type into a single list before indexing them. This way, in Aleph,
you can query for countries:gb instead of having to make a set of filters
like properties.jurisdiction:gb OR properties.country:gb OR ....
label = 'Any'
class-attribute
instance-attribute
A name for this type to be shown to users.
matchable = True
class-attribute
instance-attribute
Matchable types allow properties to be compared with each other in order to assess entity similarity. While it makes sense to compare names, countries or phone numbers, the same isn't true for raw JSON blobs or descriptive text snippets.
max_length = 250
class-attribute
instance-attribute
The maximum length of a single value of this type. This is used to warn when adding individual values that may be malformed or too long to be stored in downstream databases with fixed column lengths. The unit is unicode codepoints (not bytes), the output of Python len().
name = const('any')
class-attribute
instance-attribute
A machine-facing, variable safe name for the given type.
pivot = False
class-attribute
instance-attribute
Pivot property types are like a stronger form of :attr:~matchable types:
they will be used when value-based lookups are used to find commonalities
between entities. For example, pivot typed-properties are used to show all the
other entities that mention the same phone number, email address or name as the
one currently seen by the user.
plural = 'Any'
class-attribute
instance-attribute
A plural name for this type which can be used in appropriate places in a user interface.
total_size = None
class-attribute
instance-attribute
Some types have overall size limitations in place in order to avoid generating entities that are very large (upstream ElasticSearch has a 100MB document limit). Once the total size of all properties of this type has exceed the given limit, an entity will refuse to add further values.
caption(value, format=None)
Return a label for the given property value. This is often the same as the value, but for types like countries or languages, it would return the label, while other values like phone numbers can be formatted to be nicer to read.
clean(raw, fuzzy=False, format=None, proxy=None)
Convert a raw value into its canonical form for storage on an entity.
Returns None if the value is empty or cannot be interpreted as this type.
The fuzzy flag loosens validation for types that support it (dates,
identifiers). format supplies a type-specific hint — for example, a
strptime format string for dates. proxy is the entity the value is
being added to, which some types use for context-aware cleaning (address
normalization can use the entity's country, for example).
This method converts the input to a string, drops null-equivalents, and
then delegates to clean_text. Subclasses normally override clean_text
rather than this method.
clean_text(text, fuzzy=False, format=None, proxy=None)
Type-specific cleaning hook.
Override this in subclasses to normalize a non-null string value into the
type's canonical representation. Return None to reject the value. The
base implementation is a pass-through. clean() calls this after
stringifying the input and filtering nulls.
compare(left, right)
Score the similarity of two values of this type.
Returns a float in [0.0, 1.0]: 0.0 means the values carry no evidence
of matching, 1.0 means they are identical in the strongest
type-specific sense. Intermediate values quantify partial similarity —
for names, the Levenshtein ratio; for countries, territory overlap; for
dates, precision-aware proximity.
The base implementation does a lowercase equality check weighted by
specificity(), so a match on a longer, more specific value scores higher
than a match on a short one. Subclasses override this for richer
comparisons.
Values are assumed to be cleaned (output of clean()) but not further
normalized — compare is the right place to apply type-specific
normalization before matching.
compare_safe(left, right)
Variant of compare() that accepts None on either side.
Returns 0.0 if either argument is missing. Otherwise delegates to
compare().
compare_sets(left, right, func=max)
Score the similarity of two value sets by reducing pairwise comparisons.
Every element of left is compared to every element of right, and the
resulting scores are reduced with func — max by default, so the best
pairwise match wins. Returns 0.0 if either set is empty. Pass func=sum
or a statistical mean for alternative aggregation strategies.
country_hint(value)
Determine if the given value allows us to infer a country that it may be related to (e.g. using a country prefix on a phone number or IBAN).
join(values)
Render multiple values of this type as a single string.
Used when flattening multi-valued properties into formats that allow only
one value per cell (CSV, some RDF serializations). Values are joined with
;. The transformation is not reversible — use only at the final
serialization step.
node_id(value)
Build a graph node ID for a typed property value.
Used by graph exporters (Cypher, GEXF, Neo4J bulk) when reifying property values
into their own graph nodes — for example, turning every phone number
mentioned by any entity into a single node connected to the entities
that carry it. The default encoding is {type}:{value}, matching the
RDF URN form.
node_id_safe(value)
Wrapper for node_id to handle None values.
pick(values)
Choose the best representative value from a set of alternatives.
Used when a UI needs to display a single value for a multi-valued
property, or when reducing a set of similar values to a canonical form
(for example, picking the most complete variant of a name). Subclasses
that support picking — notably NameType — implement type-specific
heuristics. The base implementation raises NotImplementedError.
specificity(value)
Return a score for how specific the given value is. This can be used as a weighting factor in entity comparisons in order to rate matching property values by how specific they are. For example: a longer address is considered to be more specific than a short one, a full date more specific than just a year number, etc.
to_dict()
Return a serialisable description of this data type.
validate(value, fuzzy=False, format=None)
Returns a boolean to indicate if the given value is a valid instance of the type.