Skip to content

URLs

Text URL property type is used for link references. Only the schemes http, https, ftp, and mailto are supported.

Attribute Value Detail
name url Used in schema definitions
label URL plural: URLs
group urls Used in search indexing to query all properties of a given type
matchable Suitable for use in entity matching
pivot Suitable for use as a pivot point for connecting to other entities

Python API

Validation and normalisation of URLs is performed by the functions in rigour.urls.

followthemoney.types.UrlType

Bases: PropertyType

A uniform resource locator (URL). This will perform some normalisation on the URL so that it's sure to be using valid encoding/quoting, and to make sure the URL has a schema (e.g. http, https, ...).

Source code in followthemoney/types/url.py
class UrlType(PropertyType):
    """A uniform resource locator (URL). This will perform some normalisation
    on the URL so that it's sure to be using valid encoding/quoting, and to
    make sure the URL has a schema (e.g. `http`, `https`, ...)."""

    SCHEMES = ("http", "https", "ftp", "mailto")
    DEFAULT_SCHEME = "http"

    name = "url"
    group = "urls"
    label = _("URL")
    plural = _("URLs")
    matchable = True
    pivot = True
    max_length = 4096

    def clean_text(
        self,
        text: str,
        fuzzy: bool = False,
        format: Optional[str] = None,
        proxy: Optional["EntityProxy"] = None,
    ) -> Optional[str]:
        """Perform intensive care on URLs to make sure they have a scheme
        and a host name. If no scheme is given HTTP is assumed."""
        return clean_url(text)

    def compare(self, left: str, right: str) -> float:
        """Compare two URLs and return a float indicating how similar they are. This ignores
        fragments and peforms hard URL normalisation."""
        return compare_urls(left, right)

    def _specificity(self, value: str) -> float:
        return dampen(10, 120, value)

clean_text(text, fuzzy=False, format=None, proxy=None)

Perform intensive care on URLs to make sure they have a scheme and a host name. If no scheme is given HTTP is assumed.

Source code in followthemoney/types/url.py
def clean_text(
    self,
    text: str,
    fuzzy: bool = False,
    format: Optional[str] = None,
    proxy: Optional["EntityProxy"] = None,
) -> Optional[str]:
    """Perform intensive care on URLs to make sure they have a scheme
    and a host name. If no scheme is given HTTP is assumed."""
    return clean_url(text)

compare(left, right)

Compare two URLs and return a float indicating how similar they are. This ignores fragments and peforms hard URL normalisation.

Source code in followthemoney/types/url.py
def compare(self, left: str, right: str) -> float:
    """Compare two URLs and return a float indicating how similar they are. This ignores
    fragments and peforms hard URL normalisation."""
    return compare_urls(left, right)