Getting Started
This guide will walk you through the basic usage of the radiant_mlhub
library, including:
Installing & configuring the library
Discovering & fetching datasets
Discovering & fetching collections
Downloading dataset STAC catalog and assets
Background Info
If you have not already, browse Radiant MLHub to discover the datasets and ML models which are currently published on MLHub. Consider browsing the STAC specification to learn about SpatioTemporal Asset Catalogs (STAC). The MLHub API serves STAC Collections, Items and Assets.
Installation
Install with pip
$ pip install radiant_mlhub
Install with conda
$ conda install -c conda-forge radiant-mlhub
Configuration
If you have not done so already, you will need to register for an MLHub API key at https://mlhub.earth.
Once you have your API key, you will need to create a default profile by setting up a .mlhub/profiles
file in your
home directory. You can use the mlhub configure command line tool to do this:
$ mlhub configure
API Key: Enter your API key here...
Wrote profile to /home/user/.mlhub/profiles
Hint
If you do not have write access to the home directory on your machine, you can change the location of the profiles
file using the MLHUB_HOME
environment variables. For instance, setting MLHUB_HOME=/tmp/some-directory/.mlhub
will cause the client to look for your profiles in a
/tmp/some-directory/.mlhub/profiles
file. You may want to permanently set this environment variable to ensure the client continues to look in
the correct place for your profiles.
List Datasets
Once you have your profile configured, you can get a list of the available datasets from the Radiant MLHub API using the
Dataset.list
method. Remember that all datasets are also browseable and searchable on
Radiant MLHub.
>>> from radiant_mlhub import Dataset
>>> datasets = Dataset.list()
>>> # print the first 5 datasets for example
>>> for dataset in datasets[0:5]:
... print(dataset)
umd_mali_crop_type: 2019 Mali CropType Training Data
idiv_asia_crop_type: A crop type dataset for consistent land cover classification in Central Asia
dlr_fusion_competition_germany: A Fusion Dataset for Crop Type Classification in Germany
ref_fusion_competition_south_africa: A Fusion Dataset for Crop Type Classification in Western Cape, South Africa
bigearthnet_v1: BigEarthNet
Fetch a Dataset
You can also fetch a dataset by ID using the Dataset.fetch
method. This method returns a
Dataset
instance.
>>> dataset = Dataset.fetch('bigearthnet_v1')
>>> print(dataset)
bigearthnet_v1: BigEarthNet V1
Work with Dataset Collections
Datasets have one or more collections associated with them. Collections fall into two types:
source_imagery
: Collections of source imagery associated with the datasetlabels
: Collections of labeled data associated with the dataset (these collections implement the STAC Label Extension)
To list all the collections associated with a dataset use the collections
attribute.
>>> dataset.collections
[<Collection id=bigearthnet_v1_source>, <Collection id=bigearthnet_v1_labels>]
>>> type(dataset.collections[0])
radiant_mlhub.models.collection.Collection
You can also list the collections by type using the collections.source_imagery
and collections.labels
properties.
This example code shows that collections are actually STAC objects.
>>> from pprint import pprint
>>> len(dataset.collections.source_imagery)
1
>>> source_collection = dataset.collections.source_imagery[0]
>>> pprint(source_collection.to_dict())
{'description': 'BigEarthNet v1.0',
'extent': {'spatial': {'bbox': [[-9.00023345437725,
36.956956702083396,
31.598439091981028,
68.02168200047284]]},
'temporal': {'interval': [['2017-06-13T10:10:31Z',
'2018-05-29T11:54:01Z']]}},
'id': 'bigearthnet_v1_source',
'license': 'CDLA-Permissive-1.0',
'links': [{'href': 'https://api.radiant.earth/mlhub/v1/collections/bigearthnet_v1_source/items',
'rel': 'items',
'type': 'application/geo+json'},
{'href': 'https://api.radiant.earth/mlhub/v1/',
'rel': 'parent',
'type': 'application/json'},
{'href': 'https://api.radiant.earth/mlhub/v1/',
'rel': <RelType.ROOT: 'root'>,
'title': 'Radiant MLHub API',
'type': <MediaType.JSON: 'application/json'>},
{'href': 'https://api.radiant.earth/mlhub/v1/collections/bigearthnet_v1_source',
'rel': 'self',
'type': 'application/json'}],
'providers': [{'name': 'BigEarthNet',
'roles': ['processor', 'licensor'],
'url': 'http://bigearth.net'}],
'sci:citation': 'G. Sumbul, M. Charfuelan, B. Demir, V. Markl, "BigEarthNet: '
'A Large-Scale Benchmark Archive for Remote Sensing Image '
'Understanding", IEEE International Geoscience and Remote '
'Sensing Symposium, pp. 5901-5904, Yokohama, Japan, 2019.',
'sci:doi': '10.14279/depositonce-10149',
'stac_extensions': ['https://stac-extensions.github.io/scientific/v1.0.0/schema.json'],
'stac_version': '1.0.0',
'type': 'Collection'}
Download a Dataset
You can download a dataset’s STAC catalog, and all of it’s linked assets, using the
Dataset.download
method. Consider
checking the dataset size before downloading. Here is an example dataset which
is relatively small in size. The downloader can also scale up to the largest datasets.
>>> dataset = Dataset.fetch('nasa_marine_debris')
>>> print(dataset)
nasa_marine_debris: Marine Debris Dataset for Object Detection in Planetscope Imagery
>>> print(dataset.stac_catalog_size) # OK the STAC catalog archive is only ~260KB
263582
>>> print(dataset.estimated_dataset_size) # OK the total dataset assets are ~77MB
77207762
>>> dataset.download()
nasa_marine_debris: fetch stac catalog: 258KB [00:00, 412.53KB/s]
INFO:radiant_mlhub.client.catalog_downloader:unarchive nasa_marine_debris.tar.gz ...
unarchive nasa_marine_debris.tar.gz: 100%|████████████████████| 2830/2830 [00:00<00:00, 5772.09it/s]
INFO:radiant_mlhub.client.catalog_downloader:create stac asset list (please wait) ...
INFO:radiant_mlhub.client.catalog_downloader:2825 unique assets in stac catalog.
download assets: 100%|██████████████████████| 2825/2825 [03:27<00:00, 13.62it/s]
INFO:radiant_mlhub.client.catalog_downloader:assets saved to nasa_marine_debris
The Dataset.download
method
saves the STAC catalog and assets into your current working directory (by default).
The downloader has the ability to download in parallel with many cores, resume interrupted downloads, as well as options for filtering the assets to a more manageable size (highly recommended, depending on your application).
Hint
The Datasets guide has more downloading examples
and the Dataset.download
API reference is available as well.
Hint
The Collections guide has examples of downloading collection archives. Collection archives are not available for all collections, so consider using the Dataset downloader instead.
Discovering ML Models
ML Models are discoverable through the Python client as well. See the ML Models guide for more information.