vis4d.data.datasets

Datasets module.

class BDD100K(data_root, annotation_path, keys_to_load=('images', 'boxes2d'), category_map=None, config_path=None, global_instance_ids=False, bg_as_class=False, skip_empty_samples=False, attributes_to_load=None, cache_as_binary=False, cached_file_path=None, **kwargs)[source]

BDD100K type dataset, based on Scalabel.

Creates an instance of the class.

Parameters:
  • data_root (str) – Root directory of the data.

  • annotation_path (str) – Path to the annotation json(s).

  • keys_to_load (Sequence[str, ...], optional) – Keys to load from the dataset. Defaults to (K.images, K.boxes2d).

  • category_map (None | CategoryMap, optional) – Mapping from a Scalabel category string to an integer index. If None, the standard mapping in the dataset config will be used. Defaults to None.

  • config_path (None | str | Config, optional) – Path to the dataset config, can be added if it is not provided together with the labels or should be modified. Defaults to None.

  • global_instance_ids (bool) – Whether to convert tracking IDs of annotations into dataset global IDs or stay with local, per-video IDs. Defaults to false.

  • bg_as_class (bool) – Whether to include background pixels as an additional class for masks.

  • skip_empty_samples (bool) – Whether to skip samples without annotations.

  • attributes_to_load (Sequence[dict[str, str]]) – List of attributes dictionaries to load. Each dictionary is a mapping from the attribute name to its desired value. If any of the attributes dictionaries is matched, the corresponding frame will be loaded. Defaults to None.

  • cache_as_binary (bool) – Whether to cache the dataset as binary. Default: False.

  • cached_file_path (str | None) – Path to a cached file. If cached file exist then it will load it instead of generating the data mapping. Default: None.

__repr__()[source]

Concise representation of the dataset.

Return type:

str

class COCO(data_root, keys_to_load=('images', 'boxes2d', 'boxes2d_classes', 'instance_masks'), split='train2017', remove_empty=False, use_pascal_voc_cats=False, cache_as_binary=False, cached_file_path=None, **kwargs)[source]

COCO dataset class.

Initialize the COCO dataset.

Parameters:
  • data_root (str) – Path to the root directory of the dataset.

  • keys_to_load (tuple[str, ...]) – Keys to load from the dataset.

  • split (split) – Which split to load. Default: “train2017”.

  • remove_empty (bool) – Whether to remove images with no annotations.

  • use_pascal_voc_cats (bool) – Whether to use Pascal VOC categories.

  • cache_as_binary (bool) – Whether to cache the dataset as binary. Default: False.

  • cached_file_path (str | None) – Path to a cached file. If cached file exist then it will load it instead of generating the data mapping. Default: None.

__getitem__(idx)[source]

Transform coco sample to vis4d input format.

Return type:

Dict[str, Any]

Returns:

DataDict[DataKeys, Union[torch.Tensor, Dict[Any]]]

__len__()[source]

Return length of dataset.

Return type:

int

__repr__()[source]

Concise representation of the dataset.

Return type:

str

class Dataset(image_channel_mode='RGB', data_backend=None)[source]

Basic pytorch dataset with defined return type.

Initialize dataset.

Parameters:
  • image_channel_mode (str) – Image channel mode to use. Default: RGB.

  • data_backend (None | DataBackend) – Data backend to use. Default: None.

__getitem__(idx)[source]

Convert single element at given index into Vis4D data format.

Return type:

Dict[str, Any]

__len__()[source]

Return length of dataset.

Return type:

int

validate_keys(keys_to_load)[source]

Validate that all keys to load are supported.

Parameters:

keys_to_load (list[str]) – List of keys to load.

Raises:

ValueError – Raise if any key is not defined in AVAILABLE_KEYS.

Return type:

None

class S3DIS(data_root, split='trainNoArea5', keys_to_load=('points3d', 'colors3d', 'semantics3d', 'instances3d'), cache_points=True, cache_as_binary=False, cached_file_path=None, **kwargs)[source]

S3DIS dataset class.

Creates a new S3DIS dataset.

Parameters:
  • data_root (str) – Path to S3DIS folder

  • split (str) – which split to load. Must either be trainNoArea[1-6] or testArea[1-6]. e.g. trainNoArea5 will load all areas except area 5 and testArea5 will only load area 5.

  • keys_to_load (list[str]) – What kind of data should be loaded (e.g. colors, xyz, semantics, …)

  • cache_points (bool) – If true caches loaded points instead of reading them from the disk every time.

  • cache_as_binary (bool) – Whether to cache the dataset as binary. Default: False.

  • cached_file_path (str | None) – Path to a cached file. If cached file exist then it will load it instead of generating the data mapping. Default: None.

Raises:

ValueError – If requested split is malformed.

__getitem__(idx)[source]

Transform s3dis sample to vis4d input format.

Returns:

3D Poitns coordinate Shape(n x 3) colors: 3D Point colors Shape(n x 3) Semantic Classes: 3D Point classes Shape(n x 1)

Return type:

coordinates

Raises:

ValueError – If a requested key does not exist in this dataset.

__len__()[source]

Length of the datset.

Return type:

int

__repr__()[source]

Concise representation of the dataset.

Return type:

str

property num_classes: int

The number of classes int he datset.

class TorchvisionClassificationDataset(detection_ds)[source]

Wrapper for torchvision classification datasets.

This class wraps torchvision classification datasets and converts them to the format that is expected by the vis4d framework.

It expects the torchvision dataset to return a tuple of (image, class_id) where the image is a PIL Image and the class_id is an integer.

If you want to use a torchvision dataset that returns a different format, you can provide a custom data_converter function to the TorchvisionDataset class.

The returned sample will have the following key, values: images: ndarray of dimension (1, H, W, C) categories: ndarray of dimension 1.

Example: >>> from torchvision.datasets.mnist import MNIST >>> ds = TorchvisionClassificationDataset( >>> MNIST(“data/mnist_ds”, train=False) >>> ) >>> data = next(iter(ds)) >>> print(data.keys) dict_keys([‘images’, ‘categories’])

Creates a new instance of the class.

Parameters:

detection_ds (VisionDataset) – Torchvision dataset that returns a tuple of (image, class_id) where the image is a PIL Image and the class_id is an integer.

class TorchvisionDataset(torchvision_ds, data_converter)[source]

Wrapper for torchvision datasets.

This class wraps torchvision datasets and converts them to the format that is expected by the vis4d framework.

The return of the torchvisons dataset is passed to the data_converter, which needs to be provided by the user. The data_converter is expected to return a DictData object following the vis4d conventions.

For well defined dataformats, such as classification, there are already implemented wrappers that can be used. See TorchvisionClassificationDataset for an example.

Creates a new instance of the class.

Parameters:
  • torchvision_ds (VisionDataset) – Torchvision dataset that should be converted.

  • data_converter (Callable[[Any], DictData]) – Function that converts the output of the torchvision datasets __getitem__ to the format expected by the vis4d framework.

__getitem__(idx)[source]

Returns a new sample from the dataset.

Parameters:

idx (int) – Index of the sample.

Returns:

Data in vis4d format.

Return type:

DictData

__len__()[source]

Returns the number of samples in the dataset.

Returns:

Length of the dataset.

Return type:

int

class VideoDataset(*args, **kwargs)[source]

Video datasets.

Provides video_mapping attribute for video based interface and reference view samplers.

Initialize dataset.

Modules

vis4d.data.datasets.base

Base dataset classes.

vis4d.data.datasets.bdd100k

BDD100K dataset.

vis4d.data.datasets.coco

COCO dataset.

vis4d.data.datasets.imagenet

ImageNet 1k dataset.

vis4d.data.datasets.nuscenes

NuScenes multi-sensor video dataset.

vis4d.data.datasets.nuscenes_detection

NuScenes multi-sensor video dataset.

vis4d.data.datasets.nuscenes_mono

NuScenes monocular dataset.

vis4d.data.datasets.nuscenes_trajectory

NuScenes trajectory dataset.

vis4d.data.datasets.s3dis

Stanford 3D indoor dataset.

vis4d.data.datasets.scalabel

Scalabel type dataset.

vis4d.data.datasets.shift

SHIFT dataset.

vis4d.data.datasets.torchvision

Provides functionalities to wrap torchvision datasets.

vis4d.data.datasets.util

Utility functions for datasets.