Skip to content

Dates

Dates are given in a basic ISO 8601 date or date-time format, YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS.

In source data, we find varying degrees of precision: some events may be defined as a full timestamp (2021-08-25T09:26:11), while for many we only know a year (2021) or month (2021-08). Such date prefixes are carried through and used to specify date precision as well as the actual value.

In the future, FtM may include support for approximate dates (~1968), and date ranges (1968-03 -- 1973-01).

Attribute Value Detail
name date Used in schema definitions
label Date plural: Dates
group dates Used in search indexing to query all properties of a given type
matchable Suitable for use in entity matching
pivot Suitable for use as a pivot point for connecting to other entities

Temporal extent

Many schema include annotations that select properties which define the temporal start and end information for a given entity type: for example, an Ownership may have a startDate and endDate, a Company a incorporationDate and dissolutionDate, and a Person a birthDate and deathDate.

Temporal extents are meant to provide semantics that help to project entity information into timelines.

Python API

Date validation and handling in Python is performed using the prefixdate Python library, which may eventually be subsumed in rigour.

followthemoney.types.DateType

Bases: PropertyType

A date or time stamp. This is based on ISO 8601, but meant to allow for different degrees of precision by specifying a prefix. This means that 2021, 2021-02, 2021-02-16, 2021-02-16T21, 2021-02-16T21:48 and 2021-02-16T21:48:52 are all valid values, with an implied precision.

The timezone is always expected to be UTC and cannot be specified otherwise. There is no support for calendar weeks (2021-W7) and date ranges (2021-2024).

Source code in followthemoney/types/date.py
class DateType(PropertyType):
    """A date or time stamp. This is based on ISO 8601, but meant to allow for different
    degrees of precision by specifying a prefix. This means that `2021`, `2021-02`,
    `2021-02-16`, `2021-02-16T21`, `2021-02-16T21:48` and `2021-02-16T21:48:52`
    are all valid values, with an implied precision.

    The timezone is always expected to be UTC and cannot be specified otherwise. There is
    no support for calendar weeks (`2021-W7`) and date ranges (`2021-2024`)."""

    name = const("date")
    group = const("dates")
    label = _("Date")
    plural = _("Dates")
    matchable = True
    max_length = 32

    def validate(
        self, value: str, fuzzy: bool = False, format: Optional[str] = None
    ) -> bool:
        """Check if a thing is a valid date."""
        if format is not None:
            prefix = parse_format(value, format)
        else:
            prefix = parse(value)
        return prefix.precision != Precision.EMPTY

    def clean_text(
        self,
        text: str,
        fuzzy: bool = False,
        format: Optional[str] = None,
        proxy: Optional["EntityProxy"] = None,
    ) -> Optional[str]:
        """The classic: date parsing, every which way."""
        if format is not None:
            return parse_format(text, format).text
        return parse(text).text

    def _specificity(self, value: str) -> float:
        return dampen(5, 13, value)

    def compare(self, left: str, right: str) -> float:
        prefix = os.path.commonprefix([left, right])
        return dampen(4, 10, prefix)

    def to_datetime(self, value: str) -> Optional[datetime]:
        """Convert a date string to a datetime object in UTC for handling in Python. This
        will convert the unset fields beyond the prefix to the first possible value, e.g.
        `2021-02` will become `2021-02-01T00:00:00Z`.

        Args:
            value (str): The date string to convert.

        Returns:
            Optional[datetime]: The parsed datetime object in UTC, or None if parsing fails.
        """
        return parse(value).dt

    def to_number(self, value: str) -> Optional[float]:
        """Convert a date string to a number, which is the number of seconds since the epoch
        (1970-01-01T00:00:00Z).

        Args:
            value (str): The date string to convert.

        Returns:
            Optional[float]: The timestamp as a float, or None if parsing fails.
        """
        date = self.to_datetime(value)
        if date is None:
            return None
        # We make a best effort all over the app to ensure all times are in UTC.
        if date.tzinfo is None:
            date = date.replace(tzinfo=timezone.utc)
        return date.timestamp()

clean_text(text, fuzzy=False, format=None, proxy=None)

The classic: date parsing, every which way.

Source code in followthemoney/types/date.py
def clean_text(
    self,
    text: str,
    fuzzy: bool = False,
    format: Optional[str] = None,
    proxy: Optional["EntityProxy"] = None,
) -> Optional[str]:
    """The classic: date parsing, every which way."""
    if format is not None:
        return parse_format(text, format).text
    return parse(text).text

to_datetime(value)

Convert a date string to a datetime object in UTC for handling in Python. This will convert the unset fields beyond the prefix to the first possible value, e.g. 2021-02 will become 2021-02-01T00:00:00Z.

Parameters:

Name Type Description Default
value str

The date string to convert.

required

Returns:

Type Description
Optional[datetime]

Optional[datetime]: The parsed datetime object in UTC, or None if parsing fails.

Source code in followthemoney/types/date.py
def to_datetime(self, value: str) -> Optional[datetime]:
    """Convert a date string to a datetime object in UTC for handling in Python. This
    will convert the unset fields beyond the prefix to the first possible value, e.g.
    `2021-02` will become `2021-02-01T00:00:00Z`.

    Args:
        value (str): The date string to convert.

    Returns:
        Optional[datetime]: The parsed datetime object in UTC, or None if parsing fails.
    """
    return parse(value).dt

to_number(value)

Convert a date string to a number, which is the number of seconds since the epoch (1970-01-01T00:00:00Z).

Parameters:

Name Type Description Default
value str

The date string to convert.

required

Returns:

Type Description
Optional[float]

Optional[float]: The timestamp as a float, or None if parsing fails.

Source code in followthemoney/types/date.py
def to_number(self, value: str) -> Optional[float]:
    """Convert a date string to a number, which is the number of seconds since the epoch
    (1970-01-01T00:00:00Z).

    Args:
        value (str): The date string to convert.

    Returns:
        Optional[float]: The timestamp as a float, or None if parsing fails.
    """
    date = self.to_datetime(value)
    if date is None:
        return None
    # We make a best effort all over the app to ensure all times are in UTC.
    if date.tzinfo is None:
        date = date.replace(tzinfo=timezone.utc)
    return date.timestamp()

validate(value, fuzzy=False, format=None)

Check if a thing is a valid date.

Source code in followthemoney/types/date.py
def validate(
    self, value: str, fuzzy: bool = False, format: Optional[str] = None
) -> bool:
    """Check if a thing is a valid date."""
    if format is not None:
        prefix = parse_format(value, format)
    else:
        prefix = parse(value)
    return prefix.precision != Precision.EMPTY