Collections

A collection represents either a group of related labels or a group of related source imagery for a given time period and geographic area. All collections in the Radiant MLHub API are valid STAC Collections. For instance, the ref_landcovernet_v1_source collection catalogs the source imagery associated with the LandCoverNet dataset, while the ref_landcovernet_v1_labels collection catalogs the land cover labels associated with this imagery. These collections are considered part of a single ref_landcovernet_v1 dataset (see the Datasets documentation for details on working with datasets).

To discover and fetch collections you can either use the low-level client methods from radiant_mlhub.client or the Collection class. Using the Collection class is the recommended approach, but both methods are described below.

Discovering Collections

The Radiant MLHub /collections endpoint returns a list of objects describing the available collections. You can use the low-level list_collections() function to work with these responses as native Python data types (list and dict). This function returns a list of JSON-like dictionaries representing STAC Collections.

>>> from radiant_mlhub.client import list_collections
>>> from pprint import pprint
>>> collections = list_collections()
>>> first_collection = collections[0]
>>> pprint(first_collection)
{'description': 'African Crops Kenya',
 'extent': {'spatial': {'bbox': [[34.18191992149459,
                                  0.4724181558451209,
                                  34.3714943155646,
                                  0.7144217206851109]]},
            'temporal': {'interval': [['2018-04-10T00:00:00Z',
                                       '2020-03-13T00:00:00Z']]}},
 'id': 'ref_african_crops_kenya_01_labels',
 'keywords': [],
 'license': 'CC-BY-SA-4.0',
 'links': [{'href': 'https://api.radiant.earth/mlhub/v1/collections/ref_african_crops_kenya_01_labels',
            'rel': 'self',
            'title': None,
            'type': 'application/json'},
           {'href': 'https://api.radiant.earth/mlhub/v1',
            'rel': 'root',
            'title': None,
            'type': 'application/json'}],
 'properties': {},
 'providers': [{'description': None,
                'name': 'Radiant Earth Foundation',
                'roles': ['licensor', 'host', 'processor'],
                'url': 'https://radiant.earth'}],
 'sci:citation': 'PlantVillage. (2019) PlantVillage Kenya Ground Reference '
                 'Crop Type Dataset, Version 1. [Indicate subset used]. '
                 'Radiant ML Hub. [Date Accessed]',
 'sci:doi': '10.34911/rdnt.u41j87',
 'stac_extensions': [],
 'stac_version': '1.0.0-beta.2',
 'summaries': {},
 'title': None}

You can also discover collections using the Collection.list method. This is the recommended way of listing datasets. This method returns a list of Collection instances.

>>> from radiant_mlhub import Collection
>>> collections = Collection.list()
>>> first_collection = collections[0]
>>> first_collection.ref_african_crops_kenya_01_labels
'ref_african_crops_kenya_01_labels'
>>> first_collection.description
'African Crops Kenya'

Fetching a Collection

The Radiant MLHub /collections/{p1} endpoint returns an object representing a single collection. You can use the low-level get_collection() function to work with this response as a dict.

>>> from radiant_mlhub.client import get_collection
>>> collection = get_collection('ref_african_crops_kenya_01_labels')
>>> pprint(collection)
{'description': 'African Crops Kenya',
 'extent': {'spatial': {'bbox': [[34.18191992149459,
                                  0.4724181558451209,
                                  34.3714943155646,
                                  0.7144217206851109]]},
            'temporal': {'interval': [['2018-04-10T00:00:00Z',
                                       '2020-03-13T00:00:00Z']]}},
 'id': 'ref_african_crops_kenya_01_labels',
 ...
 }

You can also fetch a collection from the Radiant MLHub API based on the collection ID using the Collection.fetch method. This is the recommended way of fetching a collection. This method returns a Collection instance.

>>> collection = Collection.fetch('ref_african_crops_kenya_01_labels')
>>> collection.id
'ref_african_crops_kenya_01_labels'
>>> collection.description
'African Crops Kenya'

For more information on a Collection, you can check out the MLHub Registry page:

>>> collection.registry_url
https://registry.mlhub.earth/10.14279/depositonce-10149/

Downloading a Collection

The Radiant MLHub /archive/{archive_id} endpoint allows you to download an archive of all assets associated with a given collection. You can use the low-level download_archive() function to download the archive to your local file system.

>>> from radiant_mlhub.client import download_archive
>>> archive_path = download_archive('sn1_AOI_1_RIO')
28%|██▊       | 985.0/3496.9 [00:35<00:51, 48.31M/s]
>>> archive_path
PosixPath('/path/to/current/directory/sn1_AOI_1_RIO.tar.gz')

You can also download a collection archive using the Collection.download method. This is the recommended way of downloading an archive.

>>> collection = Collection.fetch('sn1_AOI_1_RIO')
>>> archive_path = collection.download('~/Downloads', exist_okay=False)  # Will raise exception if the file already exists
28%|██▊       | 985.0/3496.9 [00:35<00:51, 48.31M/s]
>>> archive_path
PosixPath('/Users/someuser/Downloads/sn1_AOI_1_RIO.tar.gz')

If a file of the same name already exists, these methods will check whether the downloaded file is complete by comparing its size against the size of the remote file. If they are the same size, the download is skipped, otherwise the download will be resumed from the point where it stopped. You can control this behavior using the if_exists argument. Setting this to "skip" will skip the download for existing files without checking for completeness (a bit faster since it doesn’t require a network request), and setting this to "overwrite" will overwrite any existing file.

To check the size of the download archive without actually downloading it, you can use the Collection.total_archive_size property.

>>> collection.archive_size
3504256089

Collection archives are gzipped tarballs. You can read more about the structure of these archives in this Medium post.