vis4d.data.datasets.torchvision

Provides functionalities to wrap torchvision datasets.

Classes

TorchvisionClassificationDataset(detection_ds)

Wrapper for torchvision classification datasets.

TorchvisionDataset(torchvision_ds, ...)

Wrapper for torchvision datasets.

class TorchvisionClassificationDataset(detection_ds)[source]

Wrapper for torchvision classification datasets.

This class wraps torchvision classification datasets and converts them to the format that is expected by the vis4d framework.

It expects the torchvision dataset to return a tuple of (image, class_id) where the image is a PIL Image and the class_id is an integer.

If you want to use a torchvision dataset that returns a different format, you can provide a custom data_converter function to the TorchvisionDataset class.

The returned sample will have the following key, values: images: ndarray of dimension (1, H, W, C) categories: ndarray of dimension 1.

Example: >>> from torchvision.datasets.mnist import MNIST >>> ds = TorchvisionClassificationDataset( >>> MNIST(“data/mnist_ds”, train=False) >>> ) >>> data = next(iter(ds)) >>> print(data.keys) dict_keys([‘images’, ‘categories’])

Creates a new instance of the class.

Parameters:

detection_ds (VisionDataset) – Torchvision dataset that returns a tuple of (image, class_id) where the image is a PIL Image and the class_id is an integer.

class TorchvisionDataset(torchvision_ds, data_converter)[source]

Wrapper for torchvision datasets.

This class wraps torchvision datasets and converts them to the format that is expected by the vis4d framework.

The return of the torchvisons dataset is passed to the data_converter, which needs to be provided by the user. The data_converter is expected to return a DictData object following the vis4d conventions.

For well defined dataformats, such as classification, there are already implemented wrappers that can be used. See TorchvisionClassificationDataset for an example.

Creates a new instance of the class.

Parameters:
  • torchvision_ds (VisionDataset) – Torchvision dataset that should be converted.

  • data_converter (Callable[[Any], DictData]) – Function that converts the output of the torchvision datasets __getitem__ to the format expected by the vis4d framework.

__getitem__(idx)[source]

Returns a new sample from the dataset.

Parameters:

idx (int) – Index of the sample.

Returns:

Data in vis4d format.

Return type:

DictData

__len__()[source]

Returns the number of samples in the dataset.

Returns:

Length of the dataset.

Return type:

int