vis4d.op.detect.rcnn¶

Faster R-CNN RoI head.

Functions

get_default_rcnn_box_codec([target_means, ...])

Get the default bounding box encoder and decoder for RCNN.

Classes

`RCNNHead`([num_shared_convs, num_shared_fcs, ...])	Faster R-CNN RoI head.
`RCNNLoss`(box_encoder[, num_classes, ...])	RCNN loss in Faster R-CNN.
`RCNNLosses`(rcnn_loss_cls, rcnn_loss_bbox)	RCNN loss container.
`RCNNOut`(cls_score, bbox_pred)	Faster R-CNN RoI head outputs.
`RCNNTargets`(labels, label_weights, ...)	Target container.
`RoI2Det`([box_decoder, score_threshold, ...])	Post processing of RCNN results and detection generation.

class RCNNHead(num_shared_convs=0, num_shared_fcs=2, conv_out_channels=256, in_channels=256, fc_out_channels=1024, num_classes=80, roi_size=(7, 7), start_level=2)[source]¶

Faster R-CNN RoI head.

This head pools the RoIs from a set of feature maps and processes them into classification / regression outputs.

Parameters:

num_shared_convs (int, optional) – number of shared conv layers. Defaults to 0.
num_shared_fcs (int, optional) – number of shared fc layers. Defaults to 2.
conv_out_channels (int, optional) – number of output channels for shared conv layers. Defaults to 256.
in_channels (int, optional) – Number of channels in input feature maps. Defaults to 256.
fc_out_channels (int, optional) – Output channels of shared linear layers. Defaults to 1024.
num_classes (int, optional) – number of categories. Defaults to 80.
roi_size (tuple[int, int], optional) – size of pooled RoIs. Defaults to (7, 7).

Creates an instance of the class.

__call__(features, boxes)[source]¶

Type definition for function call.

Return type:: RCNNOut

forward(features, boxes)[source]¶

Forward pass during training stage.

Return type:: RCNNOut

class RCNNLoss(box_encoder, num_classes=80, loss_cls=<function cross_entropy>, loss_bbox=<function l1_loss>)[source]¶

RCNN loss in Faster R-CNN.

This class computes the loss of RCNN given proposal boxes and their corresponding target boxes with the given box encoder.

Creates an instance of the class.

Parameters:

box_encoder (DeltaXYWHBBoxEncoder) – Decodes box regression parameters into detected boxes.
num_classes (int, optional) – number of object categories. Defaults to 80.
loss_cls (TorchLossFunc, optional) – Classification loss function. Defaults to F.cross_entropy.
loss_bbox (TorchLossFunc, optional) – Regression loss function. Defaults to l1_loss.

forward(class_outs, regression_outs, boxes, boxes_mask, target_boxes, target_classes)[source]¶

Calculate losses of RCNN head.

Parameters:

class_outs (torch.Tensor) – [M*B, num_classes] classification outputs.
regression_outs (torch.Tensor) – Tensor[M*B, regression_params] regression outputs.
boxes (list[torch.Tensor]) – [M, 4] proposal boxes per batch element.
boxes_mask (list[torch.Tensor]) – positive (1), ignore (-1), negative (0).
target_boxes (list[torch.Tensor]) – list of [M, 4] assigned target boxes for each proposal.
target_classes (list[torch.Tensor]) – list of [M,] assigned target classes for each proposal.

Returns:

classification and regression losses.

Return type:

RCNNLosses

class RCNNLosses(rcnn_loss_cls: torch.Tensor, rcnn_loss_bbox: torch.Tensor)[source]¶

RCNN loss container.

Create new instance of RCNNLosses(rcnn_loss_cls, rcnn_loss_bbox)

rcnn_loss_bbox: Tensor¶: Alias for field number 1

rcnn_loss_cls: Tensor¶: Alias for field number 0

class RCNNOut(cls_score: torch.Tensor, bbox_pred: torch.Tensor)[source]¶

Faster R-CNN RoI head outputs.

Create new instance of RCNNOut(cls_score, bbox_pred)

bbox_pred: Tensor¶: Alias for field number 1

cls_score: Tensor¶: Alias for field number 0

class RCNNTargets(labels: Tensor, label_weights: Tensor, bbox_targets: Tensor, bbox_weights: Tensor)[source]¶

Target container.

Create new instance of RCNNTargets(labels, label_weights, bbox_targets, bbox_weights)

bbox_targets: Tensor¶: Alias for field number 2

bbox_weights: Tensor¶: Alias for field number 3

label_weights: Tensor¶: Alias for field number 1

labels: Tensor¶: Alias for field number 0

class RoI2Det(box_decoder=None, score_threshold=0.05, iou_threshold=0.5, max_per_img=100, class_agnostic_nms=False)[source]¶

Post processing of RCNN results and detection generation.

It does the following: 1. Take the classification and regression outputs from the RCNN heads. 2. Take the proposal boxes that are RCNN inputs. 3. Determine the final box classes and take the according box regression

parameters.

Adjust the box sizes and offsets according the regression parameters.
Return the final boxes.

Creates an instance of the class.

Parameters:

box_decoder (DeltaXYWHBBoxDecoder, optional) – Decodes regression parameters to detected boxes. Defaults to None. If None, it will use the default decoder.
score_threshold (float, optional) – Minimum score of a detection. Defaults to 0.05.
iou_threshold (float, optional) – IoU threshold of NMS post-processing step. Defaults to 0.5.
max_per_img (int, optional) – Maximum number of detections per image. Defaults to 100.
class_agnostic_nms (bool, optional) – Whether to use class agnostic NMS. Defaults to False.

__call__(class_outs, regression_outs, boxes, images_hw)[source]¶

Type definition for function call.

Return type:: DetOut

forward(class_outs, regression_outs, boxes, images_hw)[source]¶

Convert RCNN network outputs to detections.

Parameters:

class_outs (torch.Tensor) – [B, num_classes] batched tensor of classifiation scores.
regression_outs (torch.Tensor) – [B, num_classes * 4] predicted box offsets.
boxes (list[torch.Tensor]) – Initial boxes (RoIs).
images_hw (list[tuple[int, int]]) – Image sizes.

Returns:

boxes, scores and class ids of detections per image.

Return type:

DetOut

get_default_rcnn_box_codec(target_means=(0.0, 0.0, 0.0, 0.0), target_stds=(0.1, 0.1, 0.2, 0.2))[source]¶

Get the default bounding box encoder and decoder for RCNN.

Return type:: tuple[DeltaXYWHBBoxEncoder, DeltaXYWHBBoxDecoder]