Getting Started

This guide will walk you through the basic usage of the radiant_mlhub library, including:

Installing & configuring the library
Discovering & fetching datasets
Discovering & fetching collections
Downloading assets

Installation

Install with `pip`

$ pip install radiant_mlhub

Install with `conda`

$ conda install -c conda-forge radiant-mlhub

Configuration

If you have not done so already, you will need to register for an MLHub API key here.

Once you have your API key, you will need to create a default profile by setting up a .mlhub/profiles file in your home directory. You can use the mlhub configure command line tool to do this:

$ mlhub configure
API Key: Enter your API key here...
Wrote profile to /Users/youruser/.mlhub/profiles

Hint

If you do not have write access to the home directory on your machine, you can change the location of the profiles file using the MLHUB_HOME environment variables. For instance, setting MLHUB_HOME=/tmp/some-directory/.mlhub will cause the client to look for your profiles in a /tmp/some-directory/.mlhub/profiles file. You may want to permanently set this environment variable to ensure the client continues to look in the correct place for your profiles.

List Datasets

Once you have your profile configured, you can get a list of the available datasets from the Radiant MLHub API using the Dataset.list method. This method is a generator that yields Dataset instances. You can use the id and title properties to get more information about a dataset.

>>> from radiant_mlhub import Dataset
>>> for dataset in Dataset.list():
...     print(f'{dataset.id}: {dataset.title}')
'bigearthnet_v1: BigEarthNet V1'

Fetch a Dataset

You can also fetch a dataset by ID using the Dataset.fetch method. This method returns a Dataset instance.

>>> dataset = Dataset.fetch('bigearthnet_v1')
>>> print(f'{dataset.id}: {dataset.title}')
'bigearthnet_v1: BigEarthNet V1'

Work with Dataset Collections

Datasets have 1 or more collections associated with them. Collections fall into 2 types:

source_imagery: Collections of source imagery associated with the dataset
labels: Collections of labeled data associated with the dataset (these collections implement the STAC Label Extension)

To list all the collections associated with a dataset use the collections attribute.

>>> dataset.collections
[<Collection id=bigearthnet_v1_source>, <Collection id=bigearthnet_v1_labels>]
>>> type(dataset.collections[0])
<class 'radiant_mlhub.models.Collection'>

You can also list the collections by type using the collections.source_imagery and collections.labels properties

>>> from pprint import pprint
>>> len(dataset.collections.source_imagery)
1
>>> source_collection = dataset.collections.source_imagery[0]
>>> pprint(source_collection.to_dict())
{'description': 'BigEarthNet v1.0',
 'extent': {'spatial': {'bbox': [[-9.00023345437725,
                                  1.7542686833884724,
                                  83.44558248555553,
                                  68.02168200047284]]},
            'temporal': {'interval': [['2017-06-13T10:10:31Z',
                                       '2018-05-29T11:54:01Z']]}},
 'id': 'bigearthnet_v1_source',
 'keywords': [],
 'license': 'CDLA-Permissive-1.0',
 'links': [{'href': 'https://api.radiant.earth/mlhub/v1/collections/bigearthnet_v1_source',
            'rel': 'self',
            'type': 'application/json'},
           {'href': 'https://api.radiant.earth/mlhub/v1',
            'rel': 'root',
            'type': 'application/json'}],
 'properties': {},
 'providers': [{'name': 'BigEarthNet',
                'roles': ['processor', 'licensor'],
                'url': 'https://api.radiant.earth/mlhub/v1/download/dummy-download-key'}],
 'sci:citation': 'G. Sumbul, M. Charfuelan, B. Demir, V. Markl, "BigEarthNet: '
                 'A Large-Scale Benchmark Archive for Remote Sensing Image '
                 'Understanding", IEEE International Geoscience and Remote '
                 'Sensing Symposium, pp. 5901-5904, Yokohama, Japan, 2019.',
 'stac_extensions': ['eo', 'sci'],
 'stac_version': '1.0.0-beta.2',
 'summaries': {},
 'title': None}

Download a Collection Archive

You can download all the assets associated with a collection using the Collection.download method. This method takes a path to a directory on the local file system where the archive should be saved.

If a file of the same name already exists, the client will check whether the downloaded file is complete by comparing its size against the size of the remote file. If they are the same size, the download is skipped, otherwise the download will be resumed from the point where it stopped. You can control this behavior using the if_exists argument. Setting this to "skip" will skip the download for existing files without checking for completeness (a bit faster since it doesn’t require a network request), and setting this to "overwrite" will overwrite any existing file.

>>> source_collection.download('~/Downloads')
28%|██▊       | 985.0/3496.9 [00:35<00:51, 48.31M/s]

Collection archives are gzipped tarballs. You can read more about the structure of these archives in this Medium post.