radiant_mlhub package

Submodules

radiant_mlhub.client module

Low-level functions for making requests to MLHub API endpoints.

radiant_mlhub.client.download_archive(archive_id: str, output_dir: Optional[pathlib.Path] = None, *, if_exists: str = 'resume', **session_kwargs)pathlib.Path[source]

Downloads the archive with the given ID to an output location (current working directory by default).

The if_exists argument determines how to handle an existing archive file in the output directory. The default behavior (defined by if_exists="resume") is to resume the download by requesting a byte range starting at the size of the existing file. If the existing file is the same size as the file to be downloaded (as determined by the Content-Length header), then the download is skipped. You can automatically skip download using if_exists="skip" (this may be faster if you know the download was not interrupted, since no network request is made to get the archive size). You can also overwrite the existing file using if_exists="overwrite".

Parameters
  • archive_id (str) – The ID of the archive to download. This is the same as the Collection ID.

  • output_dir (Path) – Path to which the archive will be downloaded. Defaults to the current working directory.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • **session_kwargs – Keyword arguments passed directly to get_session()

Returns

output_path – The full path to the downloaded archive file.

Return type

Path

Raises

ValueError – If if_exists is not one of "skip", "overwrite", or "resume".

radiant_mlhub.client.get_archive_info(archive_id: str, **session_kwargs)dict[source]

Gets info for the given archive from the /archive/{archive_id}/info endpoint as a JSON-like dictionary.

The JSON object returned by the API has the following properties:

  • collection: The ID of the Collection that this archive is associated with.

  • dataset: The ID of the dataset that this archive’s Collection belongs to.

  • size: The size of the archive (in bytes)

  • types: The types associated with this archive’s Collection. Will be one of "source_imagery" or "label".

Parameters
  • archive_id (str) – The ID of the archive. This is the same as the Collection ID.

  • **session_kwargs – Keyword arguments passed directly to get_session()

Returns

archive_info – JSON-like dictionary representing the API response.

Return type

dict

radiant_mlhub.client.get_collection(collection_id: str, **session_kwargs)dict[source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /collections/{p1} endpoint.

See the MLHub API docs for details.

Parameters
  • collection_id (str) – The ID of the collection to fetch

  • **session_kwargs – Keyword arguments passed directly to get_session()

Returns

collection

Return type

dict

Raises
radiant_mlhub.client.get_collection_item(collection_id: str, item_id: str, **session_kwargs)dict[source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /collections/{p1}/items/{p2} endpoint.

Parameters
  • collection_id (str) – The ID of the Collection to which the Item belongs.

  • item_id (str) – The ID of the Item.

  • **session_kwargs – Keyword arguments passed directly to get_session()

Returns

item

Return type

dict

radiant_mlhub.client.get_dataset(dataset_id: str, **session_kwargs)dict[source]

Returns a JSON-like dictionary representing the response from the Radiant MLHub GET /datasets/{dataset_id} endpoint.

See the MLHub API docs for details.

Parameters
  • dataset_id (str) – The ID of the dataset to fetch

  • **session_kwargs – Keyword arguments passed directly to get_session()

Returns

dataset

Return type

dict

radiant_mlhub.client.list_collection_items(collection_id: str, *, page_size: Optional[int] = None, extensions: Optional[List[str]] = None, limit: int = 10, **session_kwargs) → Iterator[dict][source]

Yields JSON-like dictionaries representing STAC Item objects returned by the Radiant MLHub GET /collections/{collection_id}/items endpoint.

Note

Because some collections may contain hundreds of thousands of items, this function limits the total number of responses to 10 by default. You can change this value by increasing the value of the limit keyword argument, or setting it to None to list all items. Be aware that trying to list all items in a large collection may take a very long time.

Parameters
  • collection_id (str) – The ID of the collection from which to fetch items

  • page_size (int) – The number of items to return in each page. If set to None, then this parameter will not be passed to the API and the default API value will be used (currently 30).

  • extensions (list) – If provided, then only items that support all of the extensions listed will be returned.

  • limit (int) – The maximum total number of items to yield. Defaults to 10.

  • **session_kwargs – Keyword arguments passed directly to get_session()

Yields

item (dict) – JSON-like dictionary representing a STAC Item associated with the given collection.

radiant_mlhub.client.list_collections(**session_kwargs) → List[dict][source]

Gets a list of JSON-like dictionaries representing STAC Collection objects returned by the Radiant MLHub GET /collections endpoint.

See the MLHub API docs for details.

Parameters

**session_kwargs – Keyword arguments passed directly to get_session()

Returns

collections – List of JSON-like dictionaries representing STAC Collection objects.

Return type

List[dict]

radiant_mlhub.client.list_datasets(**session_kwargs) → List[dict][source]

Gets a list of JSON-like dictionaries representing dataset objects returned by the Radiant MLHub GET /datasets endpoint.

See the MLHub API docs for details.

Parameters

**session_kwargs – Keyword arguments passed directly to get_session()

Returns

datasets

Return type

List[dict]

radiant_mlhub.exceptions module

exception radiant_mlhub.exceptions.APIKeyNotFound[source]

Bases: radiant_mlhub.exceptions.MLHubException

Raised when an API key cannot be found using any of the strategies described in the Authentication docs.

exception radiant_mlhub.exceptions.AuthenticationError[source]

Bases: radiant_mlhub.exceptions.MLHubException

Raised when the Radiant MLHub API cannot authenticate the request, either because the API key is invalid or expired, or because no API key was provided in the request.

exception radiant_mlhub.exceptions.EntityDoesNotExist[source]

Bases: radiant_mlhub.exceptions.MLHubException

Raised when attempting to fetch a collection that does not exist in the Radiant MLHub API.

exception radiant_mlhub.exceptions.MLHubException[source]

Bases: Exception

Base exception class for all Radiant MLHub exceptions

radiant_mlhub.models module

Extensions of the PySTAC classes that provide convenience methods for interacting with the Radiant MLHub API.

class radiant_mlhub.models.Collection(id, description, extent, title, stac_extensions, href, extra_fields, catalog_type, license, keywords, providers, properties, summaries, *, api_key=None, profile=None)[source]

Bases: pystac.collection.Collection

Class inheriting from pystac.Collection that adds some convenience methods for listing and fetching from the Radiant MLHub API.

property archive_size

The size of the tarball archive for this collection in bytes (or None if the archive does not exist).

download(output_dir: pathlib.Path, *, if_exists: str = 'resume', **session_kwargs)pathlib.Path[source]

Downloads the archive for this collection to an output location (current working directory by default). If the parent directories for output_path do not exist, they will be created.

The if_exists argument determines how to handle an existing archive file in the output directory. See the documentation for the download_archive() function for details. The default behavior is to resume downloading if the existing file is incomplete and skip the download if it is complete.

Note

Some collections may be very large and take a significant amount of time to download, depending on your connection speed.

Parameters
  • output_dir (Path) – Path to a local directory to which the file will be downloaded. File name will be generated automatically based on the download URL.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • **session_kwargs – Keyword arguments passed directly to get_session()

Returns

output_path – The path to the downloaded archive file.

Return type

pathlib.Path

Raises

FileExistsError – If file at output_path already exists and both exist_okay and overwrite are False.

classmethod fetch(collection_id: str, **session_kwargs)radiant_mlhub.models.Collection[source]

Creates a Collection instance by fetching the collection with the given ID from the Radiant MLHub API.

Parameters
  • collection_id (str) – The ID of the collection to fetch (e.g. bigearthnet_v1_source).

  • **session_kwargs – Keyword arguments passed directly to get_session()

Returns

collection

Return type

Collection

fetch_item(item_id: str, **session_kwargs)pystac.item.Item[source]
classmethod from_dict(d, href=None, root=None, *, api_key=None, profile=None)[source]

Patches the pystac.Collection.from_dict() method so that it returns the calling class instead of always returning a pystac.Collection instance.

get_items(**session_kwargs) → Iterator[pystac.item.Item][source]

Note

The get_items method is not implemented for Radiant MLHub Collection instances for performance reasons. Please use the Collection.download() method to download Collection assets.

Raises

NotImplementedError

classmethod list(**session_kwargs) → List[radiant_mlhub.models.Collection][source]

Returns a list of Collection instances for all collections hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters

**session_kwargs – Keyword arguments passed directly to get_session()

Returns

collections

Return type

List[Collection]

property registry_url

The URL of the registry page for this Collection. The URL is based on the DOI identifier for the collection. If the Collection does not have a "sci:doi" property then registry_url will be None.

class radiant_mlhub.models.CollectionType(value)[source]

Bases: enum.Enum

Valid values for the type of a collection associated with a Radiant MLHub dataset.

LABELS = 'labels'
SOURCE = 'source_imagery'
class radiant_mlhub.models.Dataset(id: str, collections: List[dict], title: Optional[str] = None, registry: Optional[str] = None, doi: Optional[str] = None, citation: Optional[str] = None, *, api_key: Optional[str] = None, profile: Optional[str] = None, **_)[source]

Bases: object

Class that brings together multiple Radiant MLHub “collections” that are all considered part of a single “dataset”. For instance, the bigearthnet_v1 dataset is composed of both a source imagery collection (bigearthnet_v1_source) and a labels collection (bigearthnet_v1_labels).

id

The dataset ID.

Type

str

title

The title of the dataset (or None if dataset has no title).

Type

str or None

registry_url

The URL to the registry page for this dataset, or None if no registry page exists.

Type

str or None

doi

The DOI identifier for this dataset, or None if there is no DOI for this dataset.

Type

str or None

citation

The citation information for this dataset, or None if there is no citation information.

Type

str or None

property collections

List of collections associated with this dataset. The list that is returned has 2 additional attributes (source_imagery and labels) that represent the list of collections corresponding the each type.

Note

This is a cached property, so updating self.collection_descriptions after calling self.collections the first time will have no effect on the results. See functools.cached_property() for details on clearing the cached value.

Examples

>>> from radiant_mlhub import Dataset
>>> dataset = Dataset.fetch('bigearthnet_v1')
>>> len(dataset.collections)
2
>>> len(dataset.collections.source_imagery)
1
>>> len(dataset.collections.labels)
1

To loop through all collections

>>> for collection in dataset.collections:
...     # Do something here

To loop through only the source imagery collections:

>>> for collection in dataset.collections.source_imagery:
...     # Do something here

To loop through only the label collections:

>>> for collection in dataset.collections.labels:
...     # Do something here
download(output_dir: Union[pathlib.Path, str], *, if_exists: str = 'resume', **session_kwargs) → List[pathlib.Path][source]

Downloads archives for all collections associated with this dataset to given directory. Each archive will be named using the collection ID (e.g. some_collection.tar.gz). If output_dir does not exist, it will be created.

Note

Some collections may be very large and take a significant amount of time to download, depending on your connection speed.

Parameters
  • output_dir (str or pathlib.Path) – The directory into which the archives will be written.

  • if_exists (str, optional) – How to handle an existing archive at the same location. If "skip", the download will be skipped. If "overwrite", the existing file will be overwritten and the entire file will be re-downloaded. If "resume" (the default), the existing file size will be compared to the size of the download (using the Content-Length header). If the existing file is smaller, then only the remaining portion will be downloaded. Otherwise, the download will be skipped.

  • session_kwargs – Keyword arguments passed directly to get_session()

Returns

output_paths – List of paths to the downloaded archives

Return type

List[pathlib.Path]

Raises
  • IOError – If output_dir exists and is not a directory.

  • FileExistsError – If one of the archive files already exists in the output_dir and both exist_okay and overwrite are False.

classmethod fetch(dataset_id: str, **session_kwargs)radiant_mlhub.models.Dataset[source]

Creates a Dataset instance by fetching the dataset with the given ID from the Radiant MLHub API.

Parameters
  • dataset_id (str) – The ID of the dataset to fetch (e.g. bigearthnet_v1).

  • **session_kwargs – Keyword arguments passed directly to get_session().

Returns

dataset

Return type

Dataset

classmethod list(**session_kwargs) → List[radiant_mlhub.models.Dataset][source]

Returns a list of Dataset instances for each datasets hosted by MLHub.

See the Authentication documentation for details on how authentication is handled for this request.

Parameters

**session_kwargs – Keyword arguments passed directly to get_session()

Yields

dataset (Dataset)

property total_archive_size

Gets the total size (in bytes) of the archives for all collections associated with this dataset. If no archives exist, returns None.

radiant_mlhub.session module

Methods and classes to simplify constructing and authenticating requests to the MLHub API.

It is generally recommended that you use the get_session() function to create sessions, since this will propertly handle resolution of the API key from function arguments, environment variables, and profiles as described in Authentication. See the get_session() docs for usage examples.

class radiant_mlhub.session.Session(*, api_key: Optional[str])[source]

Bases: requests.sessions.Session

Custom class inheriting from requests.Session with some additional conveniences:

  • Adds the API key as a key query parameter

  • Adds an Accept: application/json header

  • Adds a User-Agent header that contains the package name and version, plus basic system information like the OS name

  • Prepends the MLHub root URL (https://api.radiant.earth/mlhub/v1/) to any request paths without a domain

  • Raises a radiant_mlhub.exceptions.AuthenticationError for 401 (UNAUTHORIZED) responses

  • Calls requests.Response.raise_for_status() after all requests to raise exceptions for any status codes above 400.

API_KEY_ENV_VARIABLE = 'MLHUB_API_KEY'
DEFAULT_ROOT_URL = 'https://api.radiant.earth/mlhub/v1/'
MLHUB_HOME_ENV_VARIABLE = 'MLHUB_HOME'
PROFILE_ENV_VARIABLE = 'MLHUB_PROFILE'
ROOT_URL_ENV_VARIABLE = 'MLHUB_ROOT_URL'
classmethod from_config(profile: Optional[str] = None)radiant_mlhub.session.Session[source]

Create a session object by reading an API key from the given profile in the profiles file. By default, the client will look for the profiles file in a .mlhub directory in the user’s home directory (as determined by Path.home). However, if an MLHUB_HOME environment variable is present, the client will look in that directory instead.

Parameters

profile (str, optional) – The name of a profile configured in the profiles file.

Returns

session

Return type

Session

Raises

APIKeyNotFound – If the given config file does not exist, the given profile cannot be found, or there is no api_key property in the given profile section.

classmethod from_env()radiant_mlhub.session.Session[source]

Create a session object from an API key from the environment variable.

Returns

session

Return type

Session

Raises

APIKeyNotFound – If the API key cannot be found in the environment

paginate(url: str, **kwargs) → Iterator[dict][source]

Makes a GET request to the given url and paginates through all results by looking for a link in each response with a rel type of "next". Any additional keyword arguments are passed directly to requests.Session.get().

Parameters

url (str) – The URL to which the initial request will be made. Note that this may either be a full URL or a path relative to the ROOT_URL as described in Session.request().

Yields

page (dict) – An individual response as a dictionary.

request(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None, json=None)[source]

Overwrites the default requests.Session.request() method to prepend the MLHub root URL if the given url does not include a scheme. This will raise an AuthenticationError if a 401 response is returned by the server, and a HTTPError if any other status code of 400 or above is returned.

Parameters
  • method (str) – The request method to use. Passed directly to the method argument of requests.Session.request()

  • url (str) – Either a full URL or a path relative to the ROOT_URL. For example, to make a request to the Radiant MLHub API /collections endpoint, you could use session.get('collections').

  • **kwargs – All other keyword arguments are passed directly to requests.Session.request() (see that documentation for an explanation of these keyword arguments).

Raises
  • AuthenticationError – If the response status code is 401

  • HTTPError – For all other response status codes at or above 400

radiant_mlhub.session.get_session(*, api_key: Optional[str] = None, profile: Optional[str] = None)radiant_mlhub.session.Session[source]

Gets a Session object that uses the given api_key for all requests. If no api_key argument is provided then the function will try to resolve an API key by finding the following values (in order of preference):

  1. An MLHUB_API_KEY environment variable

  2. A api_key value found in the given profile section of ~/.mlhub/profiles

  3. A api_key value found in the given default section of ~/.mlhub/profiles

Parameters
  • api_key (str, optional) – The API key to use for all requests from the session. See description above for how the API key is resolved if not provided as an argument.

  • profile (str, optional) – The name of a profile configured in the .mlhub/profiles file. This will be passed directly to from_config().

Returns

session

Return type

Session

Raises

APIKeyNotFound – If no API key can be resolved.

Examples

>>> from radiant_mlhub import get_session
# Get the API from the "default" profile
>>> session = get_session()
# Get the session from the "project1" profile
# Alternatively, you could set the MLHUB_PROFILE environment variable to "project1"
>>> session = get_session(profile='project1')
# Pass an API key directly to the session
# Alternatively, you could set the MLHUB_API_KEY environment variable to "some-api-key"
>>> session = get_session(api_key='some-api-key')

Module contents