vis4d.data.iterable

Iterable datasets.

Classes

SubdividingIterableDataset(dataset, ...[, ...])

Subdivides a given dataset into smaller chunks.

class SubdividingIterableDataset(dataset, n_samples_per_batch, preprocess_fn=<function SubdividingIterableDataset.<lambda>>)[source]

Subdivides a given dataset into smaller chunks.

This also adds a field called ‘index’ (DataKeys.index) to the data struct in order to relate the data to the source index.

Example: Given a dataset (ds) that outputs tensors of the shape (10, 3): sub_ds = SubdividingIterableDataset(ds, n_samples_per_batch = 5)

next(iter(sub_ds))[‘key’].shape >> torch.Size([5, 3])

next(DataLoader(sub_ds, batch_size = 4))[‘key’].shape >> torch.size([4,5,3])

Assuming the dataset returns two entries with shape (10,3): [e[‘index’].item() for e in sub_ds] >> [0,0,1,1]

Creates a new Dataset.

Parameters:
  • dataset (Dataset) – The dataset which should be subdivided.

  • n_samples_per_batch (int) – How many samples each batch should contain. The first dimension of dataset[0].shape must be divisible by this number.

  • preprocess_fn (Callable[[list[DictData]], list[DictData]) – Preprocessing function. Defaults to identity.

__getitem__(index)[source]

Indexing is not supported for IterableDatasets.

Return type:

Dict[str, Any]

__iter__()[source]

Iterates over the dataset, supporting distributed sampling.

Return type:

Iterator[Dict[str, Any]]