Utility functions for datasets.


im_decode(im_bytes[, mode, backend])

Decode to image (numpy array, RGB) from bytes.

npy_decode(npy_bytes[, key])

Decode to numpy array from npy/npz file bytes.

ply_decode(ply_bytes[, mode])

Decode to point clouds (numpy array) from bytes.


Prints out given class frequencies.

to_onehot(categories, num_classes)

DatasetFromList(lst[, deepcopy, serialize])

Wrap a list to a torch Dataset.

Caches a mapping for fast I/O and multi-processing.

This class provides functionality for caching a mapping from dataset index requested by a call on __getitem__ to a dictionary that holds relevant information for loading the sample in question from the disk. Caching the mapping reduces startup time by loading the mapping instead of re-computing it at every startup.

NOTE: Make sure your annotations file is up-to-date. Otherwise, the mapping will be wrong and you will get wrong samples.

class DatasetFromList(lst, deepcopy=False, serialize=True)[source]

Wrap a list to a torch Dataset.

We serialize and wrap big python objects in a torch.Dataset due to a memory leak when dealing with large python objects using multiple workers. See:

Creates an instance of the class.

Return item of list at idx.

Return type:



Return len of list.

Return type:


filter_by_keys(data_dict, keys_to_keep)[source]

Filter a dictionary by keys.

  • data_dict (DictData) – The dictionary to filter.

  • keys_to_keep (list[str]) – The keys to keep.


The filtered dictionary.

Return type:


get_used_data_groups(data_groups, keys)[source]

Get the data groups that are used by the given keys.

  • data_groups (dict[str, list[str]]) – The data groups.

  • keys (list[str]) – The keys to check.


The used data groups.

Return type:


im_decode(im_bytes, mode='RGB', backend='PIL')[source]

Decode to image (numpy array, RGB) from bytes.

Return type:

ndarray[Any, dtype[uint8]]

npy_decode(npy_bytes, key=None)[source]

Decode to numpy array from npy/npz file bytes.

Return type:

Union[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float64]]]

ply_decode(ply_bytes, mode='XYZI')[source]

Decode to point clouds (numpy array) from bytes.

  • ply_bytes (bytes) – The bytes of the ply file.

  • mode (str, optional) – The point format of the ply file. If “XYZI”, the intensity channel will be included, otherwise only the XYZ coordinates. Defaults to “XYZI”.

Return type:

Union[ndarray[Any, dtype[float32]], ndarray[Any, dtype[float64]]]


Prints out given class frequencies.

Return type:


to_onehot(categories, num_classes)[source]

Transform integer categorical labels to onehot vectors.

  • categories (NDArrayI64) – Integer categorical labels of shape (N, ).

  • num_classes (int) – Number of classes.


Onehot vector of shape (N, num_classes).

Return type:
