Getting started

This notebook illustrates the basic usage of Vis4D. We run Faster R-CNN on COCO images.

CLI

The following is an example using provided Faster R-CNN toy config file to run training and inference on COCO images.

Inference

Run inference on the validation set.

!vis4d test --config faster_rcnn_example.py --config.params.num_epochs 1
[09/06 00:38:10 Vis4D]: Environment info: PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.5.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.5.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2

Versions of relevant libraries:
[pip3] mypy==1.3.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] pytorch-lightning==2.0.3
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.15.2
[conda] numpy                     1.24.3                   pypi_0    pypi
[conda] numpy-base                1.23.4          py310haf87e8b_0  
[conda] pytorch                   1.13.0                 py3.10_0    pytorch
[conda] pytorch-lightning         2.0.3                    pypi_0    pypi
[conda] torch                     2.0.1                    pypi_0    pypi
[conda] torchaudio                2.0.2                    pypi_0    pypi
[conda] torchmetrics              0.11.4                   pypi_0    pypi
[conda] torchvision               0.15.2                   pypi_0    pypi
[09/06 00:38:10 Vis4D]: Load checkpoint from http path: https://download.pytorch.org/models/resnet50-0676ba61.pth
/Users/royyang/Workspace/vis4d/vis4d/common/ckpt.py:374: UserWarning: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

  rank_zero_warn(err_msg)
[09/06 00:38:10 Vis4D]: Load checkpoint from http path: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
[09/06 00:38:10 Vis4D]: Generating COCODataset(root=data/coco_test, split=train, use_pascal_voc_cats=False) data mapping...
[09/06 00:38:10 Vis4D]: Loading COCODataset(root=data/coco_test, split=train, use_pascal_voc_cats=False) takes 0.00 seconds.
[09/06 00:38:12 Vis4D]: Testing: 1/2, ETA: 0:00:01, 1.71s/it
[09/06 00:38:13 Vis4D]: Testing: 2/2, ETA: 0:00:00, 1.61s/it
[09/06 00:38:13 Vis4D]: Running evaluator CocoEvaluator(annotation_path=data/coco_test/annotations/instances_train.json)...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
[09/06 00:38:13 Vis4D]: Det/AP: 0.3974
[09/06 00:38:13 Vis4D]: Det/AP50: 0.5497
[09/06 00:38:13 Vis4D]: Det/AP75: 0.4414
[09/06 00:38:13 Vis4D]: Det/APs: 0.2868
[09/06 00:38:13 Vis4D]: Det/APm: 0.6932
[09/06 00:38:13 Vis4D]: Det/APl: 0.6000
[09/06 00:38:13 Vis4D]: Showing results for metric: Det
[09/06 00:38:13 Vis4D]: 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.550
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.441
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.287
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.693
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.278
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.429
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.454
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.348
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.693
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633

Training

Run the training for 1 epoch and inference on the validation set.

!vis4d fit --config faster_rcnn_example.py --config.params.num_epochs 1
[09/06 00:38:21 Vis4D]: Environment info: PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.5.1 (arm64)
GCC version: Could not collect
Clang version: 14.0.3 (clang-1403.0.22.14.1)
CMake version: Could not collect
Libc version: N/A

Python version: 3.10.8 (main, Nov 24 2022, 08:08:27) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-13.5.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2

Versions of relevant libraries:
[pip3] mypy==1.3.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] pytorch-lightning==2.0.3
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.15.2
[conda] numpy                     1.24.3                   pypi_0    pypi
[conda] numpy-base                1.23.4          py310haf87e8b_0  
[conda] pytorch                   1.13.0                 py3.10_0    pytorch
[conda] pytorch-lightning         2.0.3                    pypi_0    pypi
[conda] torch                     2.0.1                    pypi_0    pypi
[conda] torchaudio                2.0.2                    pypi_0    pypi
[conda] torchmetrics              0.11.4                   pypi_0    pypi
[conda] torchvision               0.15.2                   pypi_0    pypi
[09/06 00:38:21 Vis4D]: Load checkpoint from http path: https://download.pytorch.org/models/resnet50-0676ba61.pth
/Users/royyang/Workspace/vis4d/vis4d/common/ckpt.py:374: UserWarning: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

  rank_zero_warn(err_msg)
[09/06 00:38:21 Vis4D]: Load checkpoint from http path: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
[09/06 00:38:21 Vis4D]: [rank 0] Global seed set to 1740432514
[09/06 00:38:22 Vis4D]: Generating COCODataset(root=data/coco_test, split=train, use_pascal_voc_cats=False) data mapping...
[09/06 00:38:22 Vis4D]: Loading COCODataset(root=data/coco_test, split=train, use_pascal_voc_cats=False) takes 0.00 seconds.
[09/06 00:38:22 Vis4D]: Generating COCODataset(root=data/coco_test, split=train, use_pascal_voc_cats=False) data mapping...
[09/06 00:38:22 Vis4D]: Loading COCODataset(root=data/coco_test, split=train, use_pascal_voc_cats=False) takes 0.00 seconds.
[09/06 00:38:26 Vis4D]: Epoch 1: 1/2, ETA: 0:00:04, 4.03s/it, loss: 0.9780, RPNLoss.loss_cls: 0.0689, RPNLoss.loss_bbox: 0.0667, RCNNLoss.rcnn_loss_cls: 0.4852, RCNNLoss.rcnn_loss_bbox: 0.3572
[09/06 00:38:29 Vis4D]: Epoch 1: 2/2, ETA: 0:00:00, 3.97s/it, loss: 1.0499, RPNLoss.loss_cls: 0.0906, RPNLoss.loss_bbox: 0.1142, RCNNLoss.rcnn_loss_cls: 0.4352, RCNNLoss.rcnn_loss_bbox: 0.4099
[09/06 00:38:32 Vis4D]: Testing: 1/2, ETA: 0:00:01, 1.27s/it
[09/06 00:38:33 Vis4D]: Testing: 2/2, ETA: 0:00:00, 1.24s/it
[09/06 00:38:33 Vis4D]: Running evaluator CocoEvaluator(annotation_path=data/coco_test/annotations/instances_train.json)...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
[09/06 00:38:33 Vis4D]: Det/AP: 0.3975
[09/06 00:38:33 Vis4D]: Det/AP50: 0.5500
[09/06 00:38:33 Vis4D]: Det/AP75: 0.4414
[09/06 00:38:33 Vis4D]: Det/APs: 0.2868
[09/06 00:38:33 Vis4D]: Det/APm: 0.6932
[09/06 00:38:33 Vis4D]: Det/APl: 0.6000
[09/06 00:38:33 Vis4D]: Showing results for metric: Det
[09/06 00:38:33 Vis4D]: 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.550
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.441
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.287
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.693
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.600
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.278
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.429
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.454
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.348
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.693
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.633

[09/06 00:38:33 Vis4D]: `Trainer.fit` stopped: `num_epochs=1` reached.

Python API

You can also compose the model and dataset through Python API.

First, import the necessary components from the library.

from vis4d.model.detect.faster_rcnn import FasterRCNN

from vis4d.data.const import CommonKeys as K
from vis4d.vis.functional.image import imshow_bboxes

from vis4d.config import instantiate_classes
from vis4d.zoo.base.datasets.coco import get_coco_detection_cfg

Now, let’s create the dataset and fetch the image from it.

# Create dataloader for COCO using the default config
dataloader_cfg = get_coco_detection_cfg(
    "data/coco_test/",
    train_split="train",
    test_split="train",
    samples_per_gpu=1,
    workers_per_gpu=0,
    cache_as_binary=False,
)

test_dataloader = instantiate_classes(dataloader_cfg.test_dataloader)[0]
batch = next(iter(test_dataloader))
inputs, images_hw = (
    batch[K.images],
    batch[K.input_hw],
)
Generating COCODataset(root=data/coco_test/, split=train, use_pascal_voc_cats=False) data mapping...
Loading COCODataset(root=data/coco_test/, split=train, use_pascal_voc_cats=False) takes 0.00 seconds.

Next, we can initialize and run the model on it.

faster_rcnn = FasterRCNN(num_classes=80, weights="mmdet")

faster_rcnn.eval()
dets = faster_rcnn(inputs, images_hw, original_hw=images_hw)
Load checkpoint from http path: https://download.pytorch.org/models/resnet50-0676ba61.pth
/Users/royyang/Workspace/vis4d/vis4d/common/ckpt.py:374: UserWarning: The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

  rank_zero_warn(err_msg)
Load checkpoint from http path: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth

Finally, let’s visualize the result.

print(inputs[0].shape, dets.boxes[0].shape)
imshow_bboxes(inputs[0], dets.boxes[0], dets.scores[0], dets.class_ids[0])
torch.Size([3, 800, 1248]) torch.Size([52, 4])
../_images/9e022987f91e2820ee9b0fd49f9ddf1d5c0448d2e313c001a4e0e0f9300975a9.png