Radiant MLHub Logo

Welcome to radiant_mlhub’s documentation!

The Python client for the Radiant MLHub API.

Getting Started

This guide will walk you through the basic usage of the radiant_mlhub library, including:

  • Installing & configuring the library

  • Discovering & fetching datasets

  • Discovering & fetching collections

  • Downloading assets

Installation

Install with pip

$ pip install radiant_mlhub

Install with conda

$ conda install -c conda-forge radiant-mlhub

Configuration

If you have not done so already, you will need to register for an MLHub API key here.

Once you have your API key, you will need to create a default profile by setting up a .mlhub/profiles file in your home directory. You can use the mlhub configure command line tool to do this:

$ mlhub configure
API Key: Enter your API key here...
Wrote profile to /Users/youruser/.mlhub/profiles

Hint

If you do not have write access to the home directory on your machine, you can change the location of the profiles file using the MLHUB_HOME environment variables. For instance, setting MLHUB_HOME=/tmp/some-directory/.mlhub will cause the client to look for your profiles in a /tmp/some-directory/.mlhub/profiles file. You may want to permanently set this environment variable to ensure the client continues to look in the correct place for your profiles.

List Datasets

Once you have your profile configured, you can get a list of the available datasets from the Radiant MLHub API using the Dataset.list method. This method is a generator that yields Dataset instances. You can use the id and title properties to get more information about a dataset.

>>> from radiant_mlhub import Dataset
>>> for dataset in Dataset.list():
...     print(f'{dataset.id}: {dataset.title}')
'bigearthnet_v1: BigEarthNet V1'

Fetch a Dataset

You can also fetch a dataset by ID using the Dataset.fetch method. This method returns a Dataset instance.

>>> dataset = Dataset.fetch('bigearthnet_v1')
>>> print(f'{dataset.id}: {dataset.title}')
'bigearthnet_v1: BigEarthNet V1'

Work with Dataset Collections

Datasets have 1 or more collections associated with them. Collections fall into 2 types:

  • source_imagery: Collections of source imagery associated with the dataset

  • labels: Collections of labeled data associated with the dataset (these collections implement the STAC Label Extension)

To list all the collections associated with a dataset use the collections attribute.

>>> dataset.collections
[<Collection id=bigearthnet_v1_source>, <Collection id=bigearthnet_v1_labels>]
>>> type(dataset.collections[0])
<class 'radiant_mlhub.models.Collection'>

You can also list the collections by type using the collections.source_imagery and collections.labels properties

>>> from pprint import pprint
>>> len(dataset.collections.source_imagery)
1
>>> source_collection = dataset.collections.source_imagery[0]
>>> pprint(source_collection.to_dict())
{'description': 'BigEarthNet v1.0',
 'extent': {'spatial': {'bbox': [[-9.00023345437725,
                                  1.7542686833884724,
                                  83.44558248555553,
                                  68.02168200047284]]},
            'temporal': {'interval': [['2017-06-13T10:10:31Z',
                                       '2018-05-29T11:54:01Z']]}},
 'id': 'bigearthnet_v1_source',
 'keywords': [],
 'license': 'CDLA-Permissive-1.0',
 'links': [{'href': 'https://api.radiant.earth/mlhub/v1/collections/bigearthnet_v1_source',
            'rel': 'self',
            'type': 'application/json'},
           {'href': 'https://api.radiant.earth/mlhub/v1',
            'rel': 'root',
            'type': 'application/json'}],
 'properties': {},
 'providers': [{'name': 'BigEarthNet',
                'roles': ['processor', 'licensor'],
                'url': 'https://api.radiant.earth/mlhub/v1/download/dummy-download-key'}],
 'sci:citation': 'G. Sumbul, M. Charfuelan, B. Demir, V. Markl, "BigEarthNet: '
                 'A Large-Scale Benchmark Archive for Remote Sensing Image '
                 'Understanding", IEEE International Geoscience and Remote '
                 'Sensing Symposium, pp. 5901-5904, Yokohama, Japan, 2019.',
 'stac_extensions': ['eo', 'sci'],
 'stac_version': '1.0.0-beta.2',
 'summaries': {},
 'title': None}

Download a Collection Archive

You can download all the assets associated with a collection using the Collection.download method. This method takes a path to a directory on the local file system where the archive should be saved.

If a file of the same name already exists, the client will check whether the downloaded file is complete by comparing its size against the size of the remote file. If they are the same size, the download is skipped, otherwise the download will be resumed from the point where it stopped. You can control this behavior using the if_exists argument. Setting this to "skip" will skip the download for existing files without checking for completeness (a bit faster since it doesn’t require a network request), and setting this to "overwrite" will overwrite any existing file.

>>> source_collection.download('~/Downloads')
28%|██▊       | 985.0/3496.9 [00:35<00:51, 48.31M/s]

Collection archives are gzipped tarballs. You can read more about the structure of these archives in this Medium post.

Authentication

The Radiant MLHub API uses API keys to authenticate users. These keys must be passed as a key query parameter in any request made to the API. Anyone can register for an API key by going to https://dashboard.mlhub.earth and creating an account. Once you have logged into your account, go to http://dashboard.mlhub.earth/api-keys to create API keys.

Using API Keys

The best way to add your API key to requests is to create a Session instance using the get_session() helper function and making requests using this instance:

>>> from radiant_mlhub import get_session
>>> session = get_session()
>>> r = session.get(...)

You can associate an API key with a session in a number of ways:

  • programmatically via an instantiation argument

  • using environment variables

  • using a named profile

The Session resolves an API key by trying each of the following (in this order):

  1. Use an api_key argument provided during instantiation

    >>> session = get_session(api_key='myapikey')
    
  2. Use an MLHUB_API_KEY environment variable

    >>> import os
    >>> os.environ['MLHUB_API_KEY'] = 'myapikey'
    >>> session = get_session()
    
  3. Use a profile argument provided during instantiation (see Using Profiles section for details)

    >>> session = get_session(profile='my-profile')
    
  4. Use an MLHUB_PROFILE environment variable (see Using Profiles section for details)

    >>> os.environ['MLHUB_PROFILE'] = 'my-profile'
    >>> session = get_session()
    
  5. Use the default profile (see Using Profiles section for details)

    >>> session = get_session()
    

If none of the above strategies results in a valid API key, then an APIKeyNotFound exception is raised.

The radiant_mlhub.session.Session instance inherits from requests.Session and adds a few conveniences to a typical session:

  • Adds the API key as a key query parameter

  • Adds an Accept: application/json header

  • Adds a User-Agent header that contains the package name and version, plus basic system information like the the OS name

  • Prepends the MLHub root URL (https://api.radiant.earth/mlhub/v1/) to any request paths without a domain

  • Raises a radiant_mlhub.exceptions.AuthenticationError for 401 (UNAUTHORIZED) responses

Using Profiles

Profiles in radiant_mlhub are inspired by the Named Profiles used by boto3 and awscli. These named profiles provide a way to store API keys (and potentially other configuration) on your local system so that you do not need to explicitly set environment variables or pass in arguments every time you create a session.

All profile configuration must be stored in a .mlhub/profiles file in your home directory. The profiles file uses the INI file structure supported by Python’s configparser module as described here.

Hint

If you do not have write access to the home directory on your machine, you can change the location of the profiles file using the MLHUB_HOME environment variables. For instance, setting MLHUB_HOME=/tmp/some-directory/.mlhub will cause the client to look for your profiles in a /tmp/some-directory/.mlhub/profiles file. You may want to permanently set this environment variable to ensure the client continues to look in the correct place for your profiles.

The easiest way to configure a profile is using the mlhub configure CLI tool documented in the CLI Tools section:

$ mlhub configure
API Key: <Enter your API key when prompted>
Wrote profile to /Users/youruser/.mlhub/profiles

Given the following profiles file…

[default]
api_key = default_api_key

[project1]
api_key = some_other_api_key

[project2]
api_key = yet_another_api_key

These would be the API keys used by sessions created using the various methods described in Using API Keys:

# As long as we haven't set the MLHUB_API_KEY or MLHUB_PROFILE environment variables
#  this will pull from the default profile
>>> session = get_session()
>>> session.params['key']
'default_api_key'

# Setting the MLHUB_PROFILE environment variable overrides the default profile
>>> os.environ['MLHUB_PROFILE'] = 'project1'
>>> session = get_session()
>>> session.params['key']
'some_other_api_key'

# Passing the profile argument directly overrides the MLHUB_PROFILE environment variable
>>> session = get_session(profile='profile2')
>>> session.params['key']
'yet_another_api_key'

# Setting the MLHUB_API_KEY environment variable overrides any profile-related arguments
>>> os.environ['MLHUB_API_KEY'] = 'environment_direct'
>>> session = get_session()
>>> session.params['key']
'environment_direct'

# Passing the api_key argument overrides all other strategies or finding the key
>>> session = get_session(api_key='argument_direct')
>>> session.params['key']
'argument_direct'

Making API Requests

Once you have your profiles file in place, you can create a session that will be used to make authenticated requests to the API:

>>> from radiant_mlhub import get_session
>>> session = get_session()

You can use this session to make authenticated calls to the API. For example, to list all collections:

>>> r = session.get('/collections')  # Leading slash is optional
>>> collections = r.json()['collections']
>>> print(len(collections))
47

Relative v. Absolute URLs

Any URLs that do not include a scheme (http://, https://) are assumed to be relative to the Radiant MLHub root URL. For instance, the following code would make a request to https://api.radiant.earth/mlhub/v1/some-endpoint:

>>> session.get('some-endpoint')

but the following code would make a request to https://www.google.com:

>>> session.get('https://www.google.com')

It is not recommended to make calls to APIs other than the Radiant MLHub API using these sessions.

Collections

A collection represents either a group of related labels or a group of related source imagery for a given time period and geographic area. All collections in the Radiant MLHub API are valid STAC Collections. For instance, the ref_landcovernet_v1_source collection catalogs the source imagery associated with the LandCoverNet dataset, while the ref_landcovernet_v1_labels collection catalogs the land cover labels associated with this imagery. These collections are considered part of a single ref_landcovernet_v1 dataset (see the Datasets documentation for details on working with datasets).

To discover and fetch collections you can either use the low-level client methods from radiant_mlhub.client or the Collection class. Using the Collection class is the recommended approach, but both methods are described below.

Discovering Collections

The Radiant MLHub /collections endpoint returns a list of objects describing the available collections. You can use the low-level list_collections() function to work with these responses as native Python data types (list and dict). This function returns a list of JSON-like dictionaries representing STAC Collections.

>>> from radiant_mlhub.client import list_collections
>>> from pprint import pprint
>>> collections = list_collections()
>>> first_collection = collections[0]
>>> pprint(first_collection)
{'description': 'African Crops Kenya',
 'extent': {'spatial': {'bbox': [[34.18191992149459,
                                  0.4724181558451209,
                                  34.3714943155646,
                                  0.7144217206851109]]},
            'temporal': {'interval': [['2018-04-10T00:00:00Z',
                                       '2020-03-13T00:00:00Z']]}},
 'id': 'ref_african_crops_kenya_01_labels',
 'keywords': [],
 'license': 'CC-BY-SA-4.0',
 'links': [{'href': 'https://api.radiant.earth/mlhub/v1/collections/ref_african_crops_kenya_01_labels',
            'rel': 'self',
            'title': None,
            'type': 'application/json'},
           {'href': 'https://api.radiant.earth/mlhub/v1',
            'rel': 'root',
            'title': None,
            'type': 'application/json'}],
 'properties': {},
 'providers': [{'description': None,
                'name': 'Radiant Earth Foundation',
                'roles': ['licensor', 'host', 'processor'],
                'url': 'https://radiant.earth'}],
 'sci:citation': 'PlantVillage. (2019) PlantVillage Kenya Ground Reference '
                 'Crop Type Dataset, Version 1. [Indicate subset used]. '
                 'Radiant ML Hub. [Date Accessed]',
 'sci:doi': '10.34911/rdnt.u41j87',
 'stac_extensions': [],
 'stac_version': '1.0.0-beta.2',
 'summaries': {},
 'title': None}

You can also discover collections using the Collection.list method. This is the recommended way of listing datasets. This method returns a list of Collection instances.

>>> from radiant_mlhub import Collection
>>> collections = Collection.list()
>>> first_collection = collections[0]
>>> first_collection.ref_african_crops_kenya_01_labels
'ref_african_crops_kenya_01_labels'
>>> first_collection.description
'African Crops Kenya'

Fetching a Collection

The Radiant MLHub /collections/{p1} endpoint returns an object representing a single collection. You can use the low-level get_collection() function to work with this response as a dict.

>>> from radiant_mlhub.client import get_collection
>>> collection = get_collection('ref_african_crops_kenya_01_labels')
>>> pprint(collection)
{'description': 'African Crops Kenya',
 'extent': {'spatial': {'bbox': [[34.18191992149459,
                                  0.4724181558451209,
                                  34.3714943155646,
                                  0.7144217206851109]]},
            'temporal': {'interval': [['2018-04-10T00:00:00Z',
                                       '2020-03-13T00:00:00Z']]}},
 'id': 'ref_african_crops_kenya_01_labels',
 ...
 }

You can also fetch a collection from the Radiant MLHub API based on the collection ID using the Collection.fetch method. This is the recommended way of fetching a collection. This method returns a Collection instance.

>>> collection = Collection.fetch('ref_african_crops_kenya_01_labels')
>>> collection.id
'ref_african_crops_kenya_01_labels'
>>> collection.description
'African Crops Kenya'

For more information on a Collection, you can check out the MLHub Registry page:

>>> collection.registry_url
https://registry.mlhub.earth/10.14279/depositonce-10149/

Downloading a Collection

The Radiant MLHub /archive/{archive_id} endpoint allows you to download an archive of all assets associated with a given collection. You can use the low-level download_archive() function to download the archive to your local file system.

>>> from radiant_mlhub.client import download_archive
>>> archive_path = download_archive('sn1_AOI_1_RIO')
28%|██▊       | 985.0/3496.9 [00:35<00:51, 48.31M/s]
>>> archive_path
PosixPath('/path/to/current/directory/sn1_AOI_1_RIO.tar.gz')

You can also download a collection archive using the Collection.download method. This is the recommended way of downloading an archive.

>>> collection = Collection.fetch('sn1_AOI_1_RIO')
>>> archive_path = collection.download('~/Downloads', exist_okay=False)  # Will raise exception if the file already exists
28%|██▊       | 985.0/3496.9 [00:35<00:51, 48.31M/s]
>>> archive_path
PosixPath('/Users/someuser/Downloads/sn1_AOI_1_RIO.tar.gz')

If a file of the same name already exists, these methods will check whether the downloaded file is complete by comparing its size against the size of the remote file. If they are the same size, the download is skipped, otherwise the download will be resumed from the point where it stopped. You can control this behavior using the if_exists argument. Setting this to "skip" will skip the download for existing files without checking for completeness (a bit faster since it doesn’t require a network request), and setting this to "overwrite" will overwrite any existing file.

To check the size of the download archive without actually downloading it, you can use the Collection.total_archive_size property.

>>> collection.archive_size
3504256089

Collection archives are gzipped tarballs. You can read more about the structure of these archives in this Medium post.

Datasets

A dataset represents a group of 1 or more related STAC Collections. They group together any source imagery Collections with the associated label Collections to provide a convenient mechanism for accessing all of these data together. For instance, the bigearthnet_v1_source Collection contains the source imagery for the BigEarthNet training dataset and, likewise, the bigearthnet_v1_labels Collection contains the annotations for that same dataset. These 2 collections are grouped together into the bigearthnet_v1 dataset.

The Radiant MLHub Training Data Registry provides an overview of the datasets available through the Radiant MLHub API along with dataset metadata and a listing of the associated Collections.

To discover and fetch datasets you can either use the low-level client methods from radiant_mlhub.client or the Dataset class. Using the Dataset class is the recommended approach, but both methods are described below.

Note

The objects returned by the Radiant MLHub API dataset endpoints are not STAC-compliant objects and therefore the Dataset class described below is not a PySTAC object.

Discovering Datasets

The Radiant MLHub /datasets endpoint returns a list of objects describing the available datasets and their associated collections. You can use the low-level list_datasets() function to work with these responses as native Python data types (list and dict). This function is a generator that yields a dict for each dataset.

>>> from radiant_mlhub.client import list_datasets
>>> from pprint import pprint
>>> datasets = list_datasets()
>>> first_dataset = datasets[0]
>>> pprint(first_dataset)
{'collections': [{'id': 'bigearthnet_v1_source', 'types': ['source_imagery']},
             {'id': 'bigearthnet_v1_labels', 'types': ['labels']}],
 'id': 'bigearthnet_v1',
 'title': 'BigEarthNet V1'}

You can also discover datasets using the Dataset.list method. This is the recommended way of listing datasets. This method returns a list of Dataset instances.

>>> from radiant_mlhub import Dataset
>>> datasets = Dataset.list()
>>> first_dataset = datasets[0]
>>> first_dataset.id
'bigearthnet_v1'
>>> first_dataset.title
'BigEarthNet V1'

Each of these functions/methods also accepts tags and text arguments that can be used to filter datasets by their tags or a free text search, respectively. The tags argument may be either a single string or a list of strings. Only datasets that contain all of provided tags will be returned and these tags must be an exact match. The text argument may, similarly, be either a string or a list of strings. These will be used to search all of the text-based metadata fields for a dataset (e.g. description, title, citation, etc.). Each argument is treated as a phrase by the text search engine and only datasets with matches for all of the provided phrases will be returned. So, for instance, text=["land", "cover"] will return all datasets with either "land" or "cover" somewhere in their text metadata, while text="land cover" will return all datasets with the phrase "land cover" in their text metadata.

Fetching a Dataset

The Radiant MLHub /datasets/{dataset_id} endpoint returns an object representing a single dataset. You can use the low-level get_dataset() function to work with this response as a dict.

>>> from radiant_mlhub.client import get_dataset_by_id
>>> dataset = get_dataset_by_id('bigearthnet_v1')
>>> pprint(dataset)
{'collections': [{'id': 'bigearthnet_v1_source', 'types': ['source_imagery']},
             {'id': 'bigearthnet_v1_labels', 'types': ['labels']}],
 'id': 'bigearthnet_v1',
 'title': 'BigEarthNet V1'}

You can also fetch a dataset from the Radiant MLHub API based on the dataset ID using the Dataset.fetch method. This is the recommended way of fetching a dataset. This method returns a Dataset instance.

>>> dataset = Dataset.fetch_by_id('bigearthnet_v1')
>>> dataset.id
'bigearthnet_v1'

If you would rather fetch the dataset using its DOI you can do so as well:

>>> from radiant_mlhub.client import get_dataset_by_doi
>>> # Using the client...
>>> dataset = get_dataset_by_doi("10.6084/m9.figshare.12047478.v2")
>>> # Using the model classes...
>>> dataset = Dataset.fetch_by_doi

You can also use the more general get_dataset() and Dataset.fetch methods to get a dataset using either method:

>>> from radiant_mlhub.client import get_dataset
>>> # These will all return the same dataset
>>> dataset = get_dataset("ref_african_crops_kenya_02")
>>> dataset = get_dataset("10.6084/m9.figshare.12047478.v2")
>>> dataset = Dataset.fetch("ref_african_crops_kenya_02")
>>> dataset = Dataset.fetch("10.6084/m9.figshare.12047478.v2")

Dataset Collections

If you are using the Dataset class, you can list the Collections associated with the dataset using the Dataset.collections property. This method returns a modified list that has 2 additional attributes: source_imagery and labels. You can use these attributes to list only the collections of a the associated type. All elements of these lists are instances of Collection. See the Collections documentation for details on how to work with these instances.

>>> len(first_dataset.collections)
2
>>> len(first_dataset.collections.source_imagery)
1
>>> first_dataset.collections.source_imagery[0].id
'bigearthnet_v1_source'
>>> len(first_dataset.collections.labels)
1
>>> first_dataset.collections.labels[0].id
'bigearthnet_v1_labels'

Warning

There are rare cases of collections that contain both source_imagery and labels items (e.g. the SpaceNet collections). In these cases, the collection will be listed in both the dataset.collections.labels and dataset.collections.source_imagery lists, but will only appear once in the main ``dataset.collections`` list. This may cause what appears to be a mismatch in list lengths:

>>> len(dataset.collections.source_imagery) + len(dataset.collections.labels) == len(dataset.collections)
False

Note

Both the low-level client functions and the class methods also accept keyword arguments that are passed directly to get_session() to create a session. See the Authentication documentation for details on how to use these arguments or configure the client to read your API key automatically.

Downloading a Dataset

The Radiant MLHub /archive/{archive_id} endpoint allows you to download an archive of all assets associated with a given collection. The Dataset.download method provides a convenient way of using this endpoint to download the archives for all collections associated with a given dataset. This method downloads the archives for all associated collections into the given output directory and returns a list of the paths to these archives.

If a file of the same name already exists for any of the archives, this method will check whether the downloaded file is complete by comparing its size against the size of the remote file. If they are the same size, the download is skipped, otherwise the download will be resumed from the point where it stopped. You can control this behavior using the if_exists argument. Setting this to "skip" will skip the download for existing files without checking for completeness (a bit faster since it doesn’t require a network request), and setting this to "overwrite" will overwrite any existing file.

>>> dataset = Collection.fetch('bigearthnet_v1')
>>> archive_paths = dataset.download('~/Downloads')
>>> len(archive_paths)
2

To check the total size of the download archives for all collections in the dataset without actually downloading it, you can use the Dataset.total_archive_size property.

>>> dataset.total_archive_size
71311240007

Collection archives are gzipped tarballs. You can read more about the structure of these archives in this Medium post.

ML Models

A Model represents a STAC Item implementing the ML Model extension. The goal of the ML Model Extension is to provide a way of cataloging machine learning (ML) models that operate on Earth observation (EO) data described as a STAC catalog.

To discover and fetch models you can either use the low-level client methods from radiant_mlhub.client or the MLModel class. Using the MLModel class is the recommended approach, but both methods are described below.

Discovering Models

The Radiant MLHub /models endpoint returns a list of objects describing the available models and their properties. You can use the low-level list_models() function to work with these responses as native Python data types (list and dict).

>>> from radiant_mlhub.client import list_models
>>> models = list_models()
>>> first_model = models[0]
>>> first_model.keys()
dict_keys(['id', 'bbox', 'type', 'links', 'assets', 'geometry', 'collection', 'properties', 'stac_version', 'stac_extensions'])
>>> first_model['id']
'model-cyclone-wind-estimation-torchgeo-v1'
>>> first_model['properties'].keys()
dict_keys(['title', 'license', 'sci:doi', 'datetime', 'providers', 'description', 'end_datetime', 'sci:citation', 'ml-model:type', 'start_datetime', 'sci:publications', 'ml-model:architecture', 'ml-model:prediction_type', 'ml-model:learning_approach'])

You can also discover models using the MLModel.list method. This is the recommended way of listing models. This method returns a list of MLModel instances.

>>> from radiant_mlhub import MLModel
>>> from pprint import pprint
>>> models = MLModel.list()
>>> first_model = models[0]
>>> first_model.id
'model-cyclone-wind-estimation-torchgeo-v1'
>>> pprint(first_model.assets)
{'inferencing-checkpoint': <Asset href=https://zenodo.org/record/5773331/files/last.ckpt?download=1>,
 'inferencing-compose': <Asset href=https://raw.githubusercontent.com/RadiantMLHub/cyclone-model-torchgeo/main/inferencing.yml>}
>>> len(first_model.links)
7
>>> # print only the ml-model and mlhub related links
>>> pprint([ link for link in first_model.links if 'ml-model:' in link.rel or 'mlhub:' in link.rel])
[<Link rel=ml-model:inferencing-image target=docker://docker.io/radiantearth/cyclone-model-torchgeo:1>,
 <Link rel=ml-model:train-data target=https://api.radiant.earth/mlhub/v1/collections/nasa_tropical_storm_competition_train_source>,
 <Link rel=mlhub:training-dataset target=https://mlhub.earth/data/nasa_tropical_storm_competition>]
>>> # you can access rest of properties as a dict
>>> first_model.properties.keys()
dict_keys(['title', 'license', 'sci:doi', 'datetime', 'providers', 'description', 'end_datetime', 'sci:citation', 'ml-model:type', 'start_datetime', 'sci:publications', 'ml-model:architecture', 'ml-model:prediction_type', 'ml-model:learning_approach'])

Fetching a Model

The Radiant MLHub /models/{model_id} endpoint returns an object representing a single model. You can use the low-level get_model_by_id() function to work with this response as a dict.

>>> from radiant_mlhub.client import get_model_by_id
>>> model = get_model_by_id('model-cyclone-wind-estimation-torchgeo-v1')
>>> model.keys()
dict_keys(['id', 'bbox', 'type', 'links', 'assets', 'geometry', 'collection', 'properties', 'stac_version', 'stac_extensions'])

You can also fetch a model from the Radiant MLHub API based on the model ID using the MLModel.fetch method. This is the recommended way of fetching a model. This method returns a MLModel instance.

>>> from radiant_mlhub import MLModel
>>> model = MLModel.fetch('model-cyclone-wind-estimation-torchgeo-v1')
>>> model.id
'model-cyclone-wind-estimation-torchgeo-v1'
>>> len(model.assets)
2
>>> len(model.links)
7

See the Discovering Models section above for more Python example code.

radiant_mlhub package

Subpackages

radiant_mlhub.client package

Submodules
radiant_mlhub.client.collections module
radiant_mlhub.client.collections.get_collection(collection_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /collections/{p1} endpoint.

See the MLHub API docs for details.

Parameters
  • collection_id (str) – The ID of the collection to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collection

Return type

dict

Raises
radiant_mlhub.client.collections.get_collection_item(collection_id: str, item_id: str, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /collections/{p1}/items/{p2} endpoint.

Parameters
  • collection_id (str) – The ID of the Collection to which the Item belongs.

  • item_id (str) – The ID of the Item.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

item

Return type

dict

radiant_mlhub.client.collections.list_collection_items(collection_id: str, *, page_size: Optional[int] = None, extensions: Optional[List[str]] = None, limit: int = 10, api_key: Optional[str] = None, profile: Optional[str] = None) Iterator[Dict[str, Any]][source]

Yields JSON-like dictionaries representing STAC Item objects returned by the Radiant MLHub GET /collections/{collection_id}/items endpoint.

Note

Because some collections may contain hundreds of thousands of items, this function limits the total number of responses to 10 by default. You can change this value by increasing the value of the limit keyword argument, or setting it to None to list all items. Be aware that trying to list all items in a large collection may take a very long time.

Parameters
  • collection_id (str) – The ID of the collection from which to fetch items

  • page_size (int) – The number of items to return in each page. If set to None, then this parameter will not be passed to the API and the default API value will be used (currently 30).

  • extensions (list) – If provided, then only items that support all of the extensions listed will be returned.

  • limit (int) – The maximum total number of items to yield. Defaults to 10.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Yields

item (dict) – JSON-like dictionary representing a STAC Item associated with the given collection.

radiant_mlhub.client.collections.list_collections(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dict[str, Any]][source]

Gets a list of JSON-like dictionaries representing STAC Collection objects returned by the Radiant MLHub GET /collections endpoint.

See the MLHub API docs for details.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collections – List of JSON-like dictionaries representing STAC Collection objects.

Return type

List[dict]

radiant_mlhub.client.datasets module
class radiant_mlhub.client.datasets.ArchiveInfo(*args, **kwargs)[source]

Bases: dict

collection: str
dataset: str
size: int
types: List[str]
radiant_mlhub.client.datasets.download_archive(archive_id: str, output_dir: Optional[Union[str, pathlib.Path]] = None, *, if_exists: str = 'resume', api_key: Optional[str] = None, profile: Optional[str] = None) pathlib.Path[source]

Downloads the archive with the given ID to an output location (current working directory by default).

The if_exists argument determines how to handle an existing archive file in the output directory. The default behavior (defined by if_exists="resume") is to resume the download by requesting a byte range starting at the size of the existing file. If the existing file is the same size as the file to be downloaded (as determined by the Content-Length header), then the download is skipped. You can automatically skip download using if_exists="skip" (this may be faster if you know the download was not interrupted, since no network request is made to get the archive size). You can also overwrite the existing file using if_exists="overwrite".

Parameters
  • archive_id (str) – The ID of the archive to download. This is the same as the Collection ID.

  • output_dir (Path) – Path to which the archive will be downloaded. Defaults to the current working directory.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

output_path – The full path to the downloaded archive file.

Return type

Path

Raises

ValueError – If if_exists is not one of "skip", "overwrite", or "resume".

radiant_mlhub.client.datasets.get_archive_info(archive_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Gets info for the given archive from the /archive/{archive_id}/info endpoint as a JSON-like dictionary.

The JSON object returned by the API has the following properties:

  • collection: The ID of the Collection that this archive is associated with.

  • dataset: The ID of the dataset that this archive’s Collection belongs to.

  • size: The size of the archive (in bytes)

  • types: The types associated with this archive’s Collection. Will be one of "source_imagery" or "label".

Parameters
  • archive_id (str) – The ID of the archive. This is the same as the Collection ID.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

archive_info – JSON-like dictionary representing the API response.

Return type

dict

radiant_mlhub.client.datasets.get_dataset(dataset_id_or_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing a dataset by first trying to look up the dataset by ID, then falling back to finding the dataset by DOI.

See the MLHub API docs for details.

Parameters
  • dataset_id_or_doi (str) – The ID of the dataset to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

dict

radiant_mlhub.client.datasets.get_dataset_by_doi(dataset_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /datasets/doi/{dataset_id} endpoint.

See the MLHub API docs for details.

Parameters
  • dataset_doi (str) – The DOI of the dataset to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

dict

radiant_mlhub.client.datasets.get_dataset_by_id(dataset_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /datasets/{dataset_id} endpoint.

See the MLHub API docs for details.

Parameters
  • dataset_id (str) – The ID of the dataset to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

dict

radiant_mlhub.client.datasets.list_datasets(*, tags: Optional[Union[str, Iterable[str]]] = None, text: Optional[Union[str, Iterable[str]]] = None, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dict[str, Any]][source]

Gets a list of JSON-like dictionaries representing dataset objects returned by the Radiant MLHub GET /datasets endpoint.

See the MLHub API docs for details.

Parameters
  • tags (A tag or list of tags to filter datasets by. If not None, only datasets) – containing all provided tags will be returned.

  • text (A text phrase or list of text phrases to filter datasets by. If not None,) – only datasets containing all phrases will be returned.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

datasets

Return type

List[dict]

radiant_mlhub.client.ml_models module
radiant_mlhub.client.ml_models.get_model_by_id(model_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /models/{model_id} endpoint.

See the MLHub API docs for details.

Parameters
  • model_id (str) – The ID of the ML Model to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

model

Return type

dict

radiant_mlhub.client.ml_models.list_models(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dict[str, Any]][source]

Gets a list of JSON-like dictionaries representing ML Model objects returned by the Radiant MLHub GET /models endpoint.

See the MLHub API docs for details.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

models

Return type

List[dict]

Module contents

Low-level functions for making requests to MLHub API endpoints.

radiant_mlhub.client.download_archive(archive_id: str, output_dir: Optional[Union[str, pathlib.Path]] = None, *, if_exists: str = 'resume', api_key: Optional[str] = None, profile: Optional[str] = None) pathlib.Path[source]

Downloads the archive with the given ID to an output location (current working directory by default).

The if_exists argument determines how to handle an existing archive file in the output directory. The default behavior (defined by if_exists="resume") is to resume the download by requesting a byte range starting at the size of the existing file. If the existing file is the same size as the file to be downloaded (as determined by the Content-Length header), then the download is skipped. You can automatically skip download using if_exists="skip" (this may be faster if you know the download was not interrupted, since no network request is made to get the archive size). You can also overwrite the existing file using if_exists="overwrite".

Parameters
  • archive_id (str) – The ID of the archive to download. This is the same as the Collection ID.

  • output_dir (Path) – Path to which the archive will be downloaded. Defaults to the current working directory.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

output_path – The full path to the downloaded archive file.

Return type

Path

Raises

ValueError – If if_exists is not one of "skip", "overwrite", or "resume".

radiant_mlhub.client.get_archive_info(archive_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Gets info for the given archive from the /archive/{archive_id}/info endpoint as a JSON-like dictionary.

The JSON object returned by the API has the following properties:

  • collection: The ID of the Collection that this archive is associated with.

  • dataset: The ID of the dataset that this archive’s Collection belongs to.

  • size: The size of the archive (in bytes)

  • types: The types associated with this archive’s Collection. Will be one of "source_imagery" or "label".

Parameters
  • archive_id (str) – The ID of the archive. This is the same as the Collection ID.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

archive_info – JSON-like dictionary representing the API response.

Return type

dict

radiant_mlhub.client.get_collection(collection_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /collections/{p1} endpoint.

See the MLHub API docs for details.

Parameters
  • collection_id (str) – The ID of the collection to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collection

Return type

dict

Raises
radiant_mlhub.client.get_collection_item(collection_id: str, item_id: str, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /collections/{p1}/items/{p2} endpoint.

Parameters
  • collection_id (str) – The ID of the Collection to which the Item belongs.

  • item_id (str) – The ID of the Item.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

item

Return type

dict

radiant_mlhub.client.get_dataset(dataset_id_or_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing a dataset by first trying to look up the dataset by ID, then falling back to finding the dataset by DOI.

See the MLHub API docs for details.

Parameters
  • dataset_id_or_doi (str) – The ID of the dataset to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

dict

radiant_mlhub.client.get_dataset_by_doi(dataset_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /datasets/doi/{dataset_id} endpoint.

See the MLHub API docs for details.

Parameters
  • dataset_doi (str) – The DOI of the dataset to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

dict

radiant_mlhub.client.get_dataset_by_id(dataset_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /datasets/{dataset_id} endpoint.

See the MLHub API docs for details.

Parameters
  • dataset_id (str) – The ID of the dataset to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

dict

radiant_mlhub.client.get_model_by_id(model_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dict[str, Any][source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /models/{model_id} endpoint.

See the MLHub API docs for details.

Parameters
  • model_id (str) – The ID of the ML Model to fetch

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

model

Return type

dict

radiant_mlhub.client.list_collection_items(collection_id: str, *, page_size: Optional[int] = None, extensions: Optional[List[str]] = None, limit: int = 10, api_key: Optional[str] = None, profile: Optional[str] = None) Iterator[Dict[str, Any]][source]

Yields JSON-like dictionaries representing STAC Item objects returned by the Radiant MLHub GET /collections/{collection_id}/items endpoint.

Note

Because some collections may contain hundreds of thousands of items, this function limits the total number of responses to 10 by default. You can change this value by increasing the value of the limit keyword argument, or setting it to None to list all items. Be aware that trying to list all items in a large collection may take a very long time.

Parameters
  • collection_id (str) – The ID of the collection from which to fetch items

  • page_size (int) – The number of items to return in each page. If set to None, then this parameter will not be passed to the API and the default API value will be used (currently 30).

  • extensions (list) – If provided, then only items that support all of the extensions listed will be returned.

  • limit (int) – The maximum total number of items to yield. Defaults to 10.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Yields

item (dict) – JSON-like dictionary representing a STAC Item associated with the given collection.

radiant_mlhub.client.list_collections(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dict[str, Any]][source]

Gets a list of JSON-like dictionaries representing STAC Collection objects returned by the Radiant MLHub GET /collections endpoint.

See the MLHub API docs for details.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collections – List of JSON-like dictionaries representing STAC Collection objects.

Return type

List[dict]

radiant_mlhub.client.list_datasets(*, tags: Optional[Union[str, Iterable[str]]] = None, text: Optional[Union[str, Iterable[str]]] = None, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dict[str, Any]][source]

Gets a list of JSON-like dictionaries representing dataset objects returned by the Radiant MLHub GET /datasets endpoint.

See the MLHub API docs for details.

Parameters
  • tags (A tag or list of tags to filter datasets by. If not None, only datasets) – containing all provided tags will be returned.

  • text (A text phrase or list of text phrases to filter datasets by. If not None,) – only datasets containing all phrases will be returned.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

datasets

Return type

List[dict]

radiant_mlhub.client.list_models(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dict[str, Any]][source]

Gets a list of JSON-like dictionaries representing ML Model objects returned by the Radiant MLHub GET /models endpoint.

See the MLHub API docs for details.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

models

Return type

List[dict]

radiant_mlhub.models package

Submodules
radiant_mlhub.models.collection module

Extensions of the PySTAC classes that provide convenience methods for interacting with the Radiant MLHub API.

class radiant_mlhub.models.collection.Collection(id: str, description: str, extent: pystac.collection.Extent, title: Optional[str] = None, stac_extensions: Optional[List[str]] = None, href: Optional[str] = None, extra_fields: Optional[Dict[str, Any]] = None, catalog_type: Optional[pystac.catalog.CatalogType] = None, license: str = 'proprietary', keywords: Optional[List[str]] = None, providers: Optional[List[pystac.provider.Provider]] = None, summaries: Optional[pystac.summaries.Summaries] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None)[source]

Bases: pystac.collection.Collection

Class inheriting from pystac.Collection that adds some convenience methods for listing and fetching from the Radiant MLHub API.

property archive_size: Optional[int]

The size of the tarball archive for this collection in bytes (or None if the archive does not exist).

download(output_dir: Union[str, pathlib.Path], *, if_exists: str = 'resume', api_key: Optional[str] = None, profile: Optional[str] = None) pathlib.Path[source]

Downloads the archive for this collection to an output location (current working directory by default). If the parent directories for output_path do not exist, they will be created.

The if_exists argument determines how to handle an existing archive file in the output directory. See the documentation for the download_archive() function for details. The default behavior is to resume downloading if the existing file is incomplete and skip the download if it is complete.

Note

Some collections may be very large and take a significant amount of time to download, depending on your connection speed.

Parameters
  • output_dir (Path) – Path to a local directory to which the file will be downloaded. File name will be generated automatically based on the download URL.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

output_path – The path to the downloaded archive file.

Return type

pathlib.Path

Raises

FileExistsError – If file at output_path already exists and both exist_okay and overwrite are False.

classmethod fetch(collection_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Collection[source]

Creates a Collection instance by fetching the collection with the given ID from the Radiant MLHub API.

Parameters
  • collection_id (str) – The ID of the collection to fetch (e.g. bigearthnet_v1_source).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collection

Return type

Collection

fetch_item(item_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) pystac.item.Item[source]
classmethod from_dict(d: Dict[str, Any], href: Optional[str] = None, root: Optional[pystac.catalog.Catalog] = None, migrate: bool = False, preserve_dict: bool = True, *, api_key: Optional[str] = None, profile: Optional[str] = None) Collection[source]

Patches the pystac.Collection.from_dict() method so that it returns the calling class instead of always returning a pystac.Collection instance.

get_items(*, api_key: Optional[str] = None, profile: Optional[str] = None) Iterator[pystac.item.Item][source]

Note

The get_items method is not implemented for Radiant MLHub Collection instances for performance reasons. Please use the Collection.download() method to download Collection assets.

Raises

NotImplementedError

classmethod list(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[Collection][source]

Returns a list of Collection instances for all collections hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collections

Return type

List[Collection]

property registry_url: Optional[str]

The URL of the registry page for this Collection. The URL is based on the DOI identifier for the collection. If the Collection does not have a "sci:doi" property then registry_url will be None.

radiant_mlhub.models.dataset module

Extensions of the PySTAC classes that provide convenience methods for interacting with the Radiant MLHub API.

class radiant_mlhub.models.dataset.CollectionType(value)[source]

Bases: enum.Enum

Valid values for the type of a collection associated with a Radiant MLHub dataset.

LABELS = 'labels'
SOURCE = 'source_imagery'
class radiant_mlhub.models.dataset.Dataset(id: str, collections: List[Dict[str, Any]], title: Optional[str] = None, registry: Optional[str] = None, doi: Optional[str] = None, citation: Optional[str] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None, **_: Any)[source]

Bases: object

Class that brings together multiple Radiant MLHub “collections” that are all considered part of a single “dataset”. For instance, the bigearthnet_v1 dataset is composed of both a source imagery collection (bigearthnet_v1_source) and a labels collection (bigearthnet_v1_labels).

id

The dataset ID.

Type

str

title

The title of the dataset (or None if dataset has no title).

Type

str or None

registry_url

The URL to the registry page for this dataset, or None if no registry page exists.

Type

str or None

doi

The DOI identifier for this dataset, or None if there is no DOI for this dataset.

Type

str or None

citation

The citation information for this dataset, or None if there is no citation information.

Type

str or None

property collections: radiant_mlhub.models.dataset._CollectionList

List of collections associated with this dataset. The list that is returned has 2 additional attributes (source_imagery and labels) that represent the list of collections corresponding the each type.

Note

This is a cached property, so updating self.collection_descriptions after calling self.collections the first time will have no effect on the results. See functools.cached_property() for details on clearing the cached value.

Examples

>>> from radiant_mlhub import Dataset
>>> dataset = Dataset.fetch('bigearthnet_v1')
>>> len(dataset.collections)
2
>>> len(dataset.collections.source_imagery)
1
>>> len(dataset.collections.labels)
1

To loop through all collections

>>> for collection in dataset.collections:
...     # Do something here

To loop through only the source imagery collections:

>>> for collection in dataset.collections.source_imagery:
...     # Do something here

To loop through only the label collections:

>>> for collection in dataset.collections.labels:
...     # Do something here
download(output_dir: Union[pathlib.Path, str], *, if_exists: str = 'resume', api_key: Optional[str] = None, profile: Optional[str] = None) List[pathlib.Path][source]

Downloads archives for all collections associated with this dataset to given directory. Each archive will be named using the collection ID (e.g. some_collection.tar.gz). If output_dir does not exist, it will be created.

Note

Some collections may be very large and take a significant amount of time to download, depending on your connection speed.

Parameters
  • output_dir (str or pathlib.Path) – The directory into which the archives will be written.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

output_paths – List of paths to the downloaded archives

Return type

List[pathlib.Path]

Raises
  • IOError – If output_dir exists and is not a directory.

  • FileExistsError – If one of the archive files already exists in the output_dir and both exist_okay and overwrite are False.

classmethod fetch(dataset_id_or_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by first trying to fetching the dataset based on ID, then falling back to fetching by DOI.

Parameters
  • dataset_id_or_doi (str) – The ID or DOI of the dataset to fetch (e.g. bigearthnet_v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod fetch_by_doi(dataset_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by fetching the dataset with the given DOI from the Radiant MLHub API.

Parameters
  • dataset_doi (str) – The DOI of the dataset to fetch (e.g. 10.6084/m9.figshare.12047478.v2).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod fetch_by_id(dataset_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by fetching the dataset with the given ID from the Radiant MLHub API.

Parameters
  • dataset_id (str) – The ID of the dataset to fetch (e.g. bigearthnet_v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod list(*, tags: Optional[Union[str, Iterable[str]]] = None, text: Optional[Union[str, Iterable[str]]] = None, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dataset][source]

Returns a list of Dataset instances for each datasets hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • tags (A list of tags to filter datasets by. If not None, only datasets containing all) – provided tags will be returned.

  • text (A list of text phrases to filter datasets by. If not None, only datasets) – containing all phrases will be returned.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Yields

dataset (Dataset)

property total_archive_size: Optional[int]

Gets the total size (in bytes) of the archives for all collections associated with this dataset. If no archives exist, returns None.

radiant_mlhub.models.ml_model module

Extensions of the PySTAC classes that provide convenience methods for interacting with the Radiant MLHub API.

class radiant_mlhub.models.ml_model.MLModel(id: str, geometry: Optional[Dict[str, Any]], bbox: Optional[List[float]], datetime: Optional[datetime.datetime], properties: Dict[str, Any], stac_extensions: Optional[List[str]] = None, href: Optional[str] = None, collection: Optional[Union[str, pystac.collection.Collection]] = None, extra_fields: Optional[Dict[str, Any]] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None)[source]

Bases: pystac.item.Item

assets: Dict[str, Asset]

Dictionary of Asset objects, each with a unique key.

bbox: Optional[List[float]]

Bounding Box of the asset represented by this item using either 2D or 3D geometries. The length of the array is 2*n where n is the number of dimensions. Could also be None in the case of a null geometry.

collection: Optional[Collection]

Collection to which this Item belongs, if any.

collection_id: Optional[str]

The Collection ID that this item belongs to, if any.

datetime: Optional[Datetime]

Datetime associated with this item. If None, then start_datetime and end_datetime in common_metadata will supply the datetime range of the Item.

extra_fields: Dict[str, Any]

Extra fields that are part of the top-level JSON fields the Item.

classmethod fetch(model_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) radiant_mlhub.models.ml_model.MLModel[source]

Fetches a MLModel instance by id.

Parameters
  • model_id (str) – The ID of the ML Model to fetch (e.g. model-cyclone-wind-estimation-torchgeo-v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

model

Return type

MLModel

classmethod from_dict(d: Dict[str, Any], href: Optional[str] = None, root: Optional[pystac.catalog.Catalog] = None, migrate: bool = False, preserve_dict: bool = True, *, api_key: Optional[str] = None, profile: Optional[str] = None) radiant_mlhub.models.ml_model.MLModel[source]

Patches the pystac.Item.from_dict() method so that it returns the calling class instead of always returning a pystac.Item instance.

geometry: Optional[Dict[str, Any]]

Defines the full footprint of the asset represented by this item, formatted according to RFC 7946, section 3.1 (GeoJSON).

id: str

Provider identifier. Unique within the STAC.

A list of Link objects representing all links associated with this Item.

classmethod list(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[radiant_mlhub.models.ml_model.MLModel][source]

Returns a list of MLModel instances for all models hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

models

Return type

List[MLModel]

properties: Dict[str, Any]

A dictionary of additional metadata for the Item.

session_kwargs: Dict[str, Any] = {}

Class inheriting from pystac.Item that adds some convenience methods for listing and fetching from the Radiant MLHub API.

stac_extensions: List[str]

List of extensions the Item implements.

Module contents

Extensions of the PySTAC classes that provide convenience methods for interacting with the Radiant MLHub API.

class radiant_mlhub.models.Collection(id: str, description: str, extent: pystac.collection.Extent, title: Optional[str] = None, stac_extensions: Optional[List[str]] = None, href: Optional[str] = None, extra_fields: Optional[Dict[str, Any]] = None, catalog_type: Optional[pystac.catalog.CatalogType] = None, license: str = 'proprietary', keywords: Optional[List[str]] = None, providers: Optional[List[pystac.provider.Provider]] = None, summaries: Optional[pystac.summaries.Summaries] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None)[source]

Bases: pystac.collection.Collection

Class inheriting from pystac.Collection that adds some convenience methods for listing and fetching from the Radiant MLHub API.

property archive_size: Optional[int]

The size of the tarball archive for this collection in bytes (or None if the archive does not exist).

download(output_dir: Union[str, pathlib.Path], *, if_exists: str = 'resume', api_key: Optional[str] = None, profile: Optional[str] = None) pathlib.Path[source]

Downloads the archive for this collection to an output location (current working directory by default). If the parent directories for output_path do not exist, they will be created.

The if_exists argument determines how to handle an existing archive file in the output directory. See the documentation for the download_archive() function for details. The default behavior is to resume downloading if the existing file is incomplete and skip the download if it is complete.

Note

Some collections may be very large and take a significant amount of time to download, depending on your connection speed.

Parameters
  • output_dir (Path) – Path to a local directory to which the file will be downloaded. File name will be generated automatically based on the download URL.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

output_path – The path to the downloaded archive file.

Return type

pathlib.Path

Raises

FileExistsError – If file at output_path already exists and both exist_okay and overwrite are False.

classmethod fetch(collection_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Collection[source]

Creates a Collection instance by fetching the collection with the given ID from the Radiant MLHub API.

Parameters
  • collection_id (str) – The ID of the collection to fetch (e.g. bigearthnet_v1_source).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collection

Return type

Collection

fetch_item(item_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) pystac.item.Item[source]
classmethod from_dict(d: Dict[str, Any], href: Optional[str] = None, root: Optional[pystac.catalog.Catalog] = None, migrate: bool = False, preserve_dict: bool = True, *, api_key: Optional[str] = None, profile: Optional[str] = None) Collection[source]

Patches the pystac.Collection.from_dict() method so that it returns the calling class instead of always returning a pystac.Collection instance.

get_items(*, api_key: Optional[str] = None, profile: Optional[str] = None) Iterator[pystac.item.Item][source]

Note

The get_items method is not implemented for Radiant MLHub Collection instances for performance reasons. Please use the Collection.download() method to download Collection assets.

Raises

NotImplementedError

classmethod list(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[Collection][source]

Returns a list of Collection instances for all collections hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collections

Return type

List[Collection]

property registry_url: Optional[str]

The URL of the registry page for this Collection. The URL is based on the DOI identifier for the collection. If the Collection does not have a "sci:doi" property then registry_url will be None.

class radiant_mlhub.models.Dataset(id: str, collections: List[Dict[str, Any]], title: Optional[str] = None, registry: Optional[str] = None, doi: Optional[str] = None, citation: Optional[str] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None, **_: Any)[source]

Bases: object

Class that brings together multiple Radiant MLHub “collections” that are all considered part of a single “dataset”. For instance, the bigearthnet_v1 dataset is composed of both a source imagery collection (bigearthnet_v1_source) and a labels collection (bigearthnet_v1_labels).

id

The dataset ID.

Type

str

title

The title of the dataset (or None if dataset has no title).

Type

str or None

registry_url

The URL to the registry page for this dataset, or None if no registry page exists.

Type

str or None

doi

The DOI identifier for this dataset, or None if there is no DOI for this dataset.

Type

str or None

citation

The citation information for this dataset, or None if there is no citation information.

Type

str or None

property collections: radiant_mlhub.models.dataset._CollectionList

List of collections associated with this dataset. The list that is returned has 2 additional attributes (source_imagery and labels) that represent the list of collections corresponding the each type.

Note

This is a cached property, so updating self.collection_descriptions after calling self.collections the first time will have no effect on the results. See functools.cached_property() for details on clearing the cached value.

Examples

>>> from radiant_mlhub import Dataset
>>> dataset = Dataset.fetch('bigearthnet_v1')
>>> len(dataset.collections)
2
>>> len(dataset.collections.source_imagery)
1
>>> len(dataset.collections.labels)
1

To loop through all collections

>>> for collection in dataset.collections:
...     # Do something here

To loop through only the source imagery collections:

>>> for collection in dataset.collections.source_imagery:
...     # Do something here

To loop through only the label collections:

>>> for collection in dataset.collections.labels:
...     # Do something here
download(output_dir: Union[pathlib.Path, str], *, if_exists: str = 'resume', api_key: Optional[str] = None, profile: Optional[str] = None) List[pathlib.Path][source]

Downloads archives for all collections associated with this dataset to given directory. Each archive will be named using the collection ID (e.g. some_collection.tar.gz). If output_dir does not exist, it will be created.

Note

Some collections may be very large and take a significant amount of time to download, depending on your connection speed.

Parameters
  • output_dir (str or pathlib.Path) – The directory into which the archives will be written.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

output_paths – List of paths to the downloaded archives

Return type

List[pathlib.Path]

Raises
  • IOError – If output_dir exists and is not a directory.

  • FileExistsError – If one of the archive files already exists in the output_dir and both exist_okay and overwrite are False.

classmethod fetch(dataset_id_or_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by first trying to fetching the dataset based on ID, then falling back to fetching by DOI.

Parameters
  • dataset_id_or_doi (str) – The ID or DOI of the dataset to fetch (e.g. bigearthnet_v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod fetch_by_doi(dataset_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by fetching the dataset with the given DOI from the Radiant MLHub API.

Parameters
  • dataset_doi (str) – The DOI of the dataset to fetch (e.g. 10.6084/m9.figshare.12047478.v2).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod fetch_by_id(dataset_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by fetching the dataset with the given ID from the Radiant MLHub API.

Parameters
  • dataset_id (str) – The ID of the dataset to fetch (e.g. bigearthnet_v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod list(*, tags: Optional[Union[str, Iterable[str]]] = None, text: Optional[Union[str, Iterable[str]]] = None, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dataset][source]

Returns a list of Dataset instances for each datasets hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • tags (A list of tags to filter datasets by. If not None, only datasets containing all) – provided tags will be returned.

  • text (A list of text phrases to filter datasets by. If not None, only datasets) – containing all phrases will be returned.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Yields

dataset (Dataset)

property total_archive_size: Optional[int]

Gets the total size (in bytes) of the archives for all collections associated with this dataset. If no archives exist, returns None.

class radiant_mlhub.models.MLModel(id: str, geometry: Optional[Dict[str, Any]], bbox: Optional[List[float]], datetime: Optional[datetime.datetime], properties: Dict[str, Any], stac_extensions: Optional[List[str]] = None, href: Optional[str] = None, collection: Optional[Union[str, pystac.collection.Collection]] = None, extra_fields: Optional[Dict[str, Any]] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None)[source]

Bases: pystac.item.Item

assets: Dict[str, Asset]

Dictionary of Asset objects, each with a unique key.

bbox: Optional[List[float]]

Bounding Box of the asset represented by this item using either 2D or 3D geometries. The length of the array is 2*n where n is the number of dimensions. Could also be None in the case of a null geometry.

collection: Optional[Collection]

Collection to which this Item belongs, if any.

collection_id: Optional[str]

The Collection ID that this item belongs to, if any.

datetime: Optional[Datetime]

Datetime associated with this item. If None, then start_datetime and end_datetime in common_metadata will supply the datetime range of the Item.

extra_fields: Dict[str, Any]

Extra fields that are part of the top-level JSON fields the Item.

classmethod fetch(model_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) radiant_mlhub.models.ml_model.MLModel[source]

Fetches a MLModel instance by id.

Parameters
  • model_id (str) – The ID of the ML Model to fetch (e.g. model-cyclone-wind-estimation-torchgeo-v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

model

Return type

MLModel

classmethod from_dict(d: Dict[str, Any], href: Optional[str] = None, root: Optional[pystac.catalog.Catalog] = None, migrate: bool = False, preserve_dict: bool = True, *, api_key: Optional[str] = None, profile: Optional[str] = None) radiant_mlhub.models.ml_model.MLModel[source]

Patches the pystac.Item.from_dict() method so that it returns the calling class instead of always returning a pystac.Item instance.

geometry: Optional[Dict[str, Any]]

Defines the full footprint of the asset represented by this item, formatted according to RFC 7946, section 3.1 (GeoJSON).

id: str

Provider identifier. Unique within the STAC.

A list of Link objects representing all links associated with this Item.

classmethod list(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[radiant_mlhub.models.ml_model.MLModel][source]

Returns a list of MLModel instances for all models hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

models

Return type

List[MLModel]

properties: Dict[str, Any]

A dictionary of additional metadata for the Item.

session_kwargs: Dict[str, Any] = {}

Class inheriting from pystac.Item that adds some convenience methods for listing and fetching from the Radiant MLHub API.

stac_extensions: List[str]

List of extensions the Item implements.

Submodules

radiant_mlhub.exceptions module

exception radiant_mlhub.exceptions.APIKeyNotFound[source]

Bases: radiant_mlhub.exceptions.MLHubException

Raised when an API key cannot be found using any of the strategies described in the Authentication docs.

exception radiant_mlhub.exceptions.AuthenticationError[source]

Bases: radiant_mlhub.exceptions.MLHubException

Raised when the Radiant MLHub API cannot authenticate the request, either because the API key is invalid or expired, or because no API key was provided in the request.

exception radiant_mlhub.exceptions.EntityDoesNotExist[source]

Bases: radiant_mlhub.exceptions.MLHubException

Raised when attempting to fetch a collection that does not exist in the Radiant MLHub API.

exception radiant_mlhub.exceptions.MLHubException[source]

Bases: Exception

Base exception class for all Radiant MLHub exceptions

radiant_mlhub.session module

Methods and classes to simplify constructing and authenticating requests to the MLHub API.

It is generally recommended that you use the get_session() function to create sessions, since this will propertly handle resolution of the API key from function arguments, environment variables, and profiles as described in Authentication. See the get_session() docs for usage examples.

class radiant_mlhub.session.Session(*, api_key: Optional[str])[source]

Bases: requests.sessions.Session

Custom class inheriting from requests.Session with some additional conveniences:

  • Adds the API key as a key query parameter

  • Adds an Accept: application/json header

  • Adds a User-Agent header that contains the package name and version, plus basic system information like the OS name

  • Prepends the MLHub root URL (https://api.radiant.earth/mlhub/v1/) to any request paths without a domain

  • Raises a radiant_mlhub.exceptions.AuthenticationError for 401 (UNAUTHORIZED) responses

  • Calls requests.Response.raise_for_status() after all requests to raise exceptions for any status codes above 400.

API_KEY_ENV_VARIABLE = 'MLHUB_API_KEY'
DEFAULT_ROOT_URL = 'https://api.radiant.earth/mlhub/v1/'
MLHUB_HOME_ENV_VARIABLE = 'MLHUB_HOME'
PROFILE_ENV_VARIABLE = 'MLHUB_PROFILE'
ROOT_URL_ENV_VARIABLE = 'MLHUB_ROOT_URL'
classmethod from_config(profile: Optional[str] = None) radiant_mlhub.session.Session[source]

Create a session object by reading an API key from the given profile in the profiles file. By default, the client will look for the profiles file in a .mlhub directory in the user’s home directory (as determined by Path.home). However, if an MLHUB_HOME environment variable is present, the client will look in that directory instead.

Parameters

profile (str, optional) – The name of a profile configured in the profiles file.

Returns

session

Return type

Session

Raises

APIKeyNotFound – If the given config file does not exist, the given profile cannot be found, or there is no api_key property in the given profile section.

classmethod from_env() radiant_mlhub.session.Session[source]

Create a session object from an API key from the environment variable.

Returns

session

Return type

Session

Raises

APIKeyNotFound – If the API key cannot be found in the environment

paginate(url: str, **kwargs: Any) Iterator[Dict[str, Any]][source]

Makes a GET request to the given url and paginates through all results by looking for a link in each response with a rel type of "next". Any additional keyword arguments are passed directly to requests.Session.get().

Parameters

url (str) – The URL to which the initial request will be made. Note that this may either be a full URL or a path relative to the ROOT_URL as described in Session.request().

Yields

page (dict) – An individual response as a dictionary.

request(method: str, url: str, **kwargs: Any) requests.models.Response[source]

Overwrites the default requests.Session.request() method to prepend the MLHub root URL if the given url does not include a scheme. This will raise an AuthenticationError if a 401 response is returned by the server, and a HTTPError if any other status code of 400 or above is returned.

Parameters
  • method (str) – The request method to use. Passed directly to the method argument of requests.Session.request()

  • url (str) – Either a full URL or a path relative to the ROOT_URL. For example, to make a request to the Radiant MLHub API /collections endpoint, you could use session.get('collections').

  • **kwargs – All other keyword arguments are passed directly to requests.Session.request() (see that documentation for an explanation of these keyword arguments).

Raises
  • AuthenticationError – If the response status code is 401

  • HTTPError – For all other response status codes at or above 400

radiant_mlhub.session.get_session(*, api_key: Optional[str] = None, profile: Optional[str] = None) radiant_mlhub.session.Session[source]

Gets a Session object that uses the given api_key for all requests. If no api_key argument is provided then the function will try to resolve an API key by finding the following values (in order of preference):

  1. An MLHUB_API_KEY environment variable

  2. A api_key value found in the given profile section of ~/.mlhub/profiles

  3. A api_key value found in the given default section of ~/.mlhub/profiles

Parameters
  • api_key (str, optional) – The API key to use for all requests from the session. See description above for how the API key is resolved if not provided as an argument.

  • profile (str, optional) – The name of a profile configured in the .mlhub/profiles file. This will be passed directly to from_config().

Returns

session

Return type

Session

Raises

APIKeyNotFound – If no API key can be resolved.

Examples

>>> from radiant_mlhub import get_session
# Get the API from the "default" profile
>>> session = get_session()
# Get the session from the "project1" profile
# Alternatively, you could set the MLHUB_PROFILE environment variable to "project1"
>>> session = get_session(profile='project1')
# Pass an API key directly to the session
# Alternatively, you could set the MLHUB_API_KEY environment variable to "some-api-key"
>>> session = get_session(api_key='some-api-key')

Module contents

class radiant_mlhub.Collection(id: str, description: str, extent: pystac.collection.Extent, title: Optional[str] = None, stac_extensions: Optional[List[str]] = None, href: Optional[str] = None, extra_fields: Optional[Dict[str, Any]] = None, catalog_type: Optional[pystac.catalog.CatalogType] = None, license: str = 'proprietary', keywords: Optional[List[str]] = None, providers: Optional[List[pystac.provider.Provider]] = None, summaries: Optional[pystac.summaries.Summaries] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None)[source]

Bases: pystac.collection.Collection

Class inheriting from pystac.Collection that adds some convenience methods for listing and fetching from the Radiant MLHub API.

property archive_size: Optional[int]

The size of the tarball archive for this collection in bytes (or None if the archive does not exist).

download(output_dir: Union[str, pathlib.Path], *, if_exists: str = 'resume', api_key: Optional[str] = None, profile: Optional[str] = None) pathlib.Path[source]

Downloads the archive for this collection to an output location (current working directory by default). If the parent directories for output_path do not exist, they will be created.

The if_exists argument determines how to handle an existing archive file in the output directory. See the documentation for the download_archive() function for details. The default behavior is to resume downloading if the existing file is incomplete and skip the download if it is complete.

Note

Some collections may be very large and take a significant amount of time to download, depending on your connection speed.

Parameters
  • output_dir (Path) – Path to a local directory to which the file will be downloaded. File name will be generated automatically based on the download URL.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

output_path – The path to the downloaded archive file.

Return type

pathlib.Path

Raises

FileExistsError – If file at output_path already exists and both exist_okay and overwrite are False.

classmethod fetch(collection_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Collection[source]

Creates a Collection instance by fetching the collection with the given ID from the Radiant MLHub API.

Parameters
  • collection_id (str) – The ID of the collection to fetch (e.g. bigearthnet_v1_source).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collection

Return type

Collection

fetch_item(item_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) pystac.item.Item[source]
classmethod from_dict(d: Dict[str, Any], href: Optional[str] = None, root: Optional[pystac.catalog.Catalog] = None, migrate: bool = False, preserve_dict: bool = True, *, api_key: Optional[str] = None, profile: Optional[str] = None) Collection[source]

Patches the pystac.Collection.from_dict() method so that it returns the calling class instead of always returning a pystac.Collection instance.

get_items(*, api_key: Optional[str] = None, profile: Optional[str] = None) Iterator[pystac.item.Item][source]

Note

The get_items method is not implemented for Radiant MLHub Collection instances for performance reasons. Please use the Collection.download() method to download Collection assets.

Raises

NotImplementedError

classmethod list(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[Collection][source]

Returns a list of Collection instances for all collections hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

collections

Return type

List[Collection]

property registry_url: Optional[str]

The URL of the registry page for this Collection. The URL is based on the DOI identifier for the collection. If the Collection does not have a "sci:doi" property then registry_url will be None.

class radiant_mlhub.Dataset(id: str, collections: List[Dict[str, Any]], title: Optional[str] = None, registry: Optional[str] = None, doi: Optional[str] = None, citation: Optional[str] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None, **_: Any)[source]

Bases: object

Class that brings together multiple Radiant MLHub “collections” that are all considered part of a single “dataset”. For instance, the bigearthnet_v1 dataset is composed of both a source imagery collection (bigearthnet_v1_source) and a labels collection (bigearthnet_v1_labels).

id

The dataset ID.

Type

str

title

The title of the dataset (or None if dataset has no title).

Type

str or None

registry_url

The URL to the registry page for this dataset, or None if no registry page exists.

Type

str or None

doi

The DOI identifier for this dataset, or None if there is no DOI for this dataset.

Type

str or None

citation

The citation information for this dataset, or None if there is no citation information.

Type

str or None

property collections: radiant_mlhub.models.dataset._CollectionList

List of collections associated with this dataset. The list that is returned has 2 additional attributes (source_imagery and labels) that represent the list of collections corresponding the each type.

Note

This is a cached property, so updating self.collection_descriptions after calling self.collections the first time will have no effect on the results. See functools.cached_property() for details on clearing the cached value.

Examples

>>> from radiant_mlhub import Dataset
>>> dataset = Dataset.fetch('bigearthnet_v1')
>>> len(dataset.collections)
2
>>> len(dataset.collections.source_imagery)
1
>>> len(dataset.collections.labels)
1

To loop through all collections

>>> for collection in dataset.collections:
...     # Do something here

To loop through only the source imagery collections:

>>> for collection in dataset.collections.source_imagery:
...     # Do something here

To loop through only the label collections:

>>> for collection in dataset.collections.labels:
...     # Do something here
download(output_dir: Union[pathlib.Path, str], *, if_exists: str = 'resume', api_key: Optional[str] = None, profile: Optional[str] = None) List[pathlib.Path][source]

Downloads archives for all collections associated with this dataset to given directory. Each archive will be named using the collection ID (e.g. some_collection.tar.gz). If output_dir does not exist, it will be created.

Note

Some collections may be very large and take a significant amount of time to download, depending on your connection speed.

Parameters
  • output_dir (str or pathlib.Path) – The directory into which the archives will be written.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

output_paths – List of paths to the downloaded archives

Return type

List[pathlib.Path]

Raises
  • IOError – If output_dir exists and is not a directory.

  • FileExistsError – If one of the archive files already exists in the output_dir and both exist_okay and overwrite are False.

classmethod fetch(dataset_id_or_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by first trying to fetching the dataset based on ID, then falling back to fetching by DOI.

Parameters
  • dataset_id_or_doi (str) – The ID or DOI of the dataset to fetch (e.g. bigearthnet_v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod fetch_by_doi(dataset_doi: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by fetching the dataset with the given DOI from the Radiant MLHub API.

Parameters
  • dataset_doi (str) – The DOI of the dataset to fetch (e.g. 10.6084/m9.figshare.12047478.v2).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod fetch_by_id(dataset_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) Dataset[source]

Creates a Dataset instance by fetching the dataset with the given ID from the Radiant MLHub API.

Parameters
  • dataset_id (str) – The ID of the dataset to fetch (e.g. bigearthnet_v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

dataset

Return type

Dataset

classmethod list(*, tags: Optional[Union[str, Iterable[str]]] = None, text: Optional[Union[str, Iterable[str]]] = None, api_key: Optional[str] = None, profile: Optional[str] = None) List[Dataset][source]

Returns a list of Dataset instances for each datasets hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • tags (A list of tags to filter datasets by. If not None, only datasets containing all) – provided tags will be returned.

  • text (A list of text phrases to filter datasets by. If not None, only datasets) – containing all phrases will be returned.

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Yields

dataset (Dataset)

property total_archive_size: Optional[int]

Gets the total size (in bytes) of the archives for all collections associated with this dataset. If no archives exist, returns None.

class radiant_mlhub.MLModel(id: str, geometry: Optional[Dict[str, Any]], bbox: Optional[List[float]], datetime: Optional[datetime.datetime], properties: Dict[str, Any], stac_extensions: Optional[List[str]] = None, href: Optional[str] = None, collection: Optional[Union[str, pystac.collection.Collection]] = None, extra_fields: Optional[Dict[str, Any]] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None)[source]

Bases: pystac.item.Item

classmethod fetch(model_id: str, *, api_key: Optional[str] = None, profile: Optional[str] = None) radiant_mlhub.models.ml_model.MLModel[source]

Fetches a MLModel instance by id.

Parameters
  • model_id (str) – The ID of the ML Model to fetch (e.g. model-cyclone-wind-estimation-torchgeo-v1).

  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

model

Return type

MLModel

classmethod from_dict(d: Dict[str, Any], href: Optional[str] = None, root: Optional[pystac.catalog.Catalog] = None, migrate: bool = False, preserve_dict: bool = True, *, api_key: Optional[str] = None, profile: Optional[str] = None) radiant_mlhub.models.ml_model.MLModel[source]

Patches the pystac.Item.from_dict() method so that it returns the calling class instead of always returning a pystac.Item instance.

classmethod list(*, api_key: Optional[str] = None, profile: Optional[str] = None) List[radiant_mlhub.models.ml_model.MLModel][source]

Returns a list of MLModel instances for all models hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters
  • api_key (str) – An API key to use for this request. This will override an API key set in a profile on using an environment variable

  • profile (str) – A profile to use when making this request.

Returns

models

Return type

List[MLModel]

session_kwargs: Dict[str, Any] = {}

Class inheriting from pystac.Item that adds some convenience methods for listing and fetching from the Radiant MLHub API.

radiant_mlhub.get_session(*, api_key: Optional[str] = None, profile: Optional[str] = None) radiant_mlhub.session.Session[source]

Gets a Session object that uses the given api_key for all requests. If no api_key argument is provided then the function will try to resolve an API key by finding the following values (in order of preference):

  1. An MLHUB_API_KEY environment variable

  2. A api_key value found in the given profile section of ~/.mlhub/profiles

  3. A api_key value found in the given default section of ~/.mlhub/profiles

Parameters
  • api_key (str, optional) – The API key to use for all requests from the session. See description above for how the API key is resolved if not provided as an argument.

  • profile (str, optional) – The name of a profile configured in the .mlhub/profiles file. This will be passed directly to from_config().

Returns

session

Return type

Session

Raises

APIKeyNotFound – If no API key can be resolved.

Examples

>>> from radiant_mlhub import get_session
# Get the API from the "default" profile
>>> session = get_session()
# Get the session from the "project1" profile
# Alternatively, you could set the MLHUB_PROFILE environment variable to "project1"
>>> session = get_session(profile='project1')
# Pass an API key directly to the session
# Alternatively, you could set the MLHUB_API_KEY environment variable to "some-api-key"
>>> session = get_session(api_key='some-api-key')

CLI Tools

mlhub

CLI tool for the radiant_mlhub Python client.

mlhub [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

configure

Interactively set up radiant_mlhub configuration file.

This tool walks you through setting up a ~/.mlhub/profiles file and adding an API key. If you do not provide a –profile option, it will update the “default” profile. If you do not provide an –api-key option, you will be prompted to enter an API key by the tool.

If you need to change the location of the profiles file, set the MLHUB_HOME environment variable before running this command.

For details on profiles and authentication for the radiant_mlhub client, please see the official Authentication documentation:

https://radiant-mlhub.readthedocs.io

mlhub configure [OPTIONS]

Options

--profile <profile>

The name of the profile to configure.

--api-key <api_key>

The API key to use for this profile.

Indices and tables