vis4d.op.detect.retinanet

RetinaNet.

Functions

decode_multi_level_outputs(cls_out_all, ...)

Decode box energies into detections for a single image.

get_default_anchor_generator()

Get default anchor generator.

get_default_box_codec()

Get the default bounding box encoder.

get_default_box_matcher()

Get default bounding box matcher.

get_default_box_sampler()

Get default bounding box sampler.

get_params_per_level(cls_out, reg_out, anchors)

Get topk params from feature output per level per image before nms.

Classes

Dense2Det(anchor_generator, box_decoder[, ...])

Compute detections from dense network outputs.

RetinaNetHead(num_classes, in_channels[, ...])

RetinaNet Head.

RetinaNetHeadLoss(anchor_generator, box_encoder)

Loss of RetinaNet head.

RetinaNetOut(cls_score, bbox_pred)

RetinaNet head outputs.

class Dense2Det(anchor_generator, box_decoder, num_pre_nms=2000, max_per_img=1000, nms_threshold=0.7, min_box_size=(0, 0), score_thr=0.0)[source]

Compute detections from dense network outputs.

This class acts as a stateless functor that does the following: 1. Create anchor grid for feature grids (classification and regression outputs) at all scales. For each image

For each level

2. Get a topk pre-selection of flattened classification scores and box energies from feature output before NMS.

3. Decode class scores and box energies into detection boxes, apply NMS.

Return detection boxes for all images.

Creates an instance of the class.

__call__(cls_outs, reg_outs, images_hw)[source]

Type definition for function call.

Return type:

DetOut

forward(cls_outs, reg_outs, images_hw)[source]

Compute detections from dense network outputs.

Generate anchor grid for all scales. For each batch element:

Compute classification, regression, and anchor pairs for all scales. Decode those pairs into proposals, post-process with NMS.

Parameters:
  • cls_outs (list[torch.Tensor]) – [N, C * A, H, W] per scale.

  • reg_outs (list[torch.Tensor]) – [N, 4 * A, H, W] per scale.

  • images_hw (list[tuple[int, int]]) – list of image sizes.

Returns:

Detection outputs.

Return type:

DetOut

class RetinaNetHead(num_classes, in_channels, feat_channels=256, stacked_convs=4, use_sigmoid_cls=True, anchor_generator=None, box_decoder=None, box_matcher=None, box_sampler=None)[source]

RetinaNet Head.

Creates an instance of the class.

__call__(features)[source]

Type definition for call implementation.

Return type:

RetinaNetOut

forward(features)[source]

Forward pass of RetinaNet.

Parameters:

features (list[torch.Tensor]) – Feature pyramid

Returns:

classification score and box prediction.

Return type:

RetinaNetOut

class RetinaNetHeadLoss(anchor_generator, box_encoder, box_matcher=None, box_sampler=None, loss_cls=<function sigmoid_focal_loss>, loss_bbox=<function l1_loss>)[source]

Loss of RetinaNet head.

Creates an instance of the class.

Parameters:
  • anchor_generator (AnchorGenerator) – Generates anchor grid priors.

  • box_encoder (DeltaXYWHBBoxEncoder) – Encodes bounding boxes to the desired network output.

  • box_matcher (None | Matcher, optional) – Box matcher. Defaults to None.

  • box_sampler (None | Sampler, optional) – Box sampler. Defaults to None.

  • loss_cls (TorchLossFunc, optional) – Classification loss function. Defaults to sigmoid_focal_loss.

  • loss_bbox (TorchLossFunc, optional) – Regression loss function. Defaults to l1_loss.

class RetinaNetOut(cls_score: list[torch.Tensor], bbox_pred: list[torch.Tensor])[source]

RetinaNet head outputs.

Create new instance of RetinaNetOut(cls_score, bbox_pred)

bbox_pred: list[Tensor]

Alias for field number 1

cls_score: list[Tensor]

Alias for field number 0

decode_multi_level_outputs(cls_out_all, lbl_out_all, reg_out_all, anchors_all, image_hw, box_decoder, max_per_img=1000, nms_threshold=0.7, min_box_size=(0, 0))[source]

Decode box energies into detections for a single image.

Detections are post-processed via NMS. NMS is performed per level. Afterwards, select topk detections.

Parameters:
  • cls_out_all (list[torch.Tensor]) – topk class scores per level.

  • lbl_out_all (list[torch.Tensor]) – topk class labels per level.

  • reg_out_all (list[torch.Tensor]) – topk regression params per level.

  • anchors_all (list[torch.Tensor]) – topk anchor boxes per level.

  • image_hw (tuple[int, int]) – image size.

  • box_decoder (DeltaXYWHBBoxDecoder) – bounding box encoder.

  • max_per_img (int, optional) – maximum predictions per image. Defaults to 1000.

  • nms_threshold (float, optional) – iou threshold for NMS. Defaults to 0.7.

  • min_box_size (tuple[int, int], optional) – minimum box size. Defaults to (0, 0).

Returns:

decoded proposal boxes & scores.

Return type:

tuple[torch.Tensor, torch.Tensor]

get_default_anchor_generator()[source]

Get default anchor generator.

Return type:

AnchorGenerator

get_default_box_codec()[source]

Get the default bounding box encoder.

Return type:

tuple[DeltaXYWHBBoxEncoder, DeltaXYWHBBoxDecoder]

get_default_box_matcher()[source]

Get default bounding box matcher.

Return type:

MaxIoUMatcher

get_default_box_sampler()[source]

Get default bounding box sampler.

Return type:

PseudoSampler

get_params_per_level(cls_out, reg_out, anchors, num_pre_nms=2000, score_thr=0.0)[source]

Get topk params from feature output per level per image before nms.

Params include flattened classification scores, box energies, and corresponding anchors.

Parameters:
  • cls_out (torch.Tensor) – [C, H, W] classification scores at a particular scale.

  • reg_out (torch.Tensor) – [C, H, W] regression parameters at a particular scale.

  • anchors (torch.Tensor) – [H * W, 4] anchor boxes per cell.

  • num_pre_nms (int) – number of predictions before nms.

  • score_thr (float) – score threshold for filtering predictions.

Returns:

topk

flattened classification, regression outputs, and corresponding anchors.

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]