vis4d.op.detect.yolox

YOLOX detection head.

Modified from mmdetection (https://github.com/open-mmlab/mmdetection).

Functions

bbox_xyxy_to_cxcywh(bbox)

Convert bbox coordinates from (x1, y1, x2, y2) to (cx, cy, w, h).

bboxes_nms(cls_scores, bboxes, objectness[, ...])

Decode box energies into detections for a single image.

get_default_point_generator()

Get default point generator.

get_l1_target(bbox_target, priors[, eps])

Convert gt bboxes to center offset and log width height.

preprocess_outputs(cls_outs, reg_outs, ...)

Preprocess model outputs before postprocessing/loss computation.

Classes

YOLOXHead(num_classes, in_channels[, ...])

YOLOX Head.

YOLOXHeadLoss(num_classes[, ...])

Loss of YOLOX head.

YOLOXHeadLosses(loss_cls, loss_bbox, ...)

YOLOX head loss container.

YOLOXOut(cls_score, bbox_pred, objectness)

YOLOX head outputs.

YOLOXPostprocess(point_generator, box_decoder)

Postprocess detections from YOLOX detection head.

class YOLOXHead(num_classes, in_channels, feat_channels=256, stacked_convs=2, strides=(8, 16, 32), point_generator=None, box_decoder=None)[source]

YOLOX Head.

Parameters:
  • num_classes (int) – Number of classes.

  • in_channels (int) – Number of input channels.

  • feat_channels (int, optional) – Number of feature channels. Defaults to 256.

  • stacked_convs (int, optional) – Number of stacked convolutions. Defaults to 2.

  • strides (Sequence[int], optional) – Strides for each feature level. Defaults to (8, 16, 32).

  • point_generator (MlvlPointGenerator, optional) – Point generator. Defaults to None.

  • box_decoder (YOLOXBBoxDecoder, optional) – Bounding box decoder. Defaults to None.

  • box_matcher (Matcher, optional) – Bounding box matcher. Defaults to None.

  • box_sampler (Sampler, optional) – Bounding box sampler. Defaults to None.

Creates an instance of the class.

__call__(features)[source]

Type definition for call implementation.

Return type:

YOLOXOut

forward(features)[source]

Forward pass of YOLOX head.

Parameters:

features (list[torch.Tensor]) – Input features.

Returns:

Classification, box, and objectness predictions.

Return type:

YOLOXOut

class YOLOXHeadLoss(num_classes, point_generator=None, box_decoder=None, loss_cls=<function binary_cross_entropy_with_logits>, loss_bbox=IoULoss(), loss_obj=<function binary_cross_entropy_with_logits>, loss_l1=None)[source]

Loss of YOLOX head.

Creates an instance of the class.

Parameters:
  • num_classes (int) – Number of classes.

  • point_generator (MlvlPointGenerator) – Point generator.

  • box_decoder (YOLOXBBoxDecoder) – Box decoder.

  • loss_cls (TorchLossFunc, optional) – Classification loss function. Defaults to sigmoid_focal_loss.

  • loss_bbox (TorchLossFunc, optional) – Regression loss function. Defaults to l1_loss.

  • loss_obj (TorchLossFunc, optional) – Objectness loss function. Defaults to sigmoid_focal_loss.

  • loss_l1 (TorchLossFunc | None, optional) – L1 loss function. Defaults to None. Only used during the final few epochs.

__call__(cls_outs, reg_outs, obj_outs, target_boxes, target_class_ids, images_hw)[source]

Type definition.

Return type:

YOLOXHeadLosses

forward(cls_outs, reg_outs, obj_outs, target_boxes, target_class_ids, images_hw)[source]

Compute YOLOX classification, regression, and objectness losses.

Parameters:
  • cls_outs (list[Tensor]) – Network classification outputs at all scales.

  • reg_outs (list[Tensor]) – Network regression outputs at all scales.

  • obj_outs (list[Tensor]) – Network objectness outputs at all scales.

  • target_boxes (list[Tensor]) – Target bounding boxes.

  • images_hw (list[tuple[int, int]]) – Image dimensions without padding.

  • target_class_ids (list[Tensor]) – Target class labels.

Returns:

YOLOX losses.

Return type:

YOLOXHeadLosses

class YOLOXHeadLosses(loss_cls: Tensor, loss_bbox: Tensor, loss_obj: Tensor, loss_l1: Tensor | None)[source]

YOLOX head loss container.

Create new instance of YOLOXHeadLosses(loss_cls, loss_bbox, loss_obj, loss_l1)

loss_bbox: Tensor

Alias for field number 1

loss_cls: Tensor

Alias for field number 0

loss_l1: Tensor | None

Alias for field number 3

loss_obj: Tensor

Alias for field number 2

class YOLOXOut(cls_score: list[torch.Tensor], bbox_pred: list[torch.Tensor], objectness: list[torch.Tensor])[source]

YOLOX head outputs.

Create new instance of YOLOXOut(cls_score, bbox_pred, objectness)

bbox_pred: list[Tensor]

Alias for field number 1

cls_score: list[Tensor]

Alias for field number 0

objectness: list[Tensor]

Alias for field number 2

class YOLOXPostprocess(point_generator, box_decoder, nms_threshold=0.65, score_thr=0.01, nms_pre=-1, max_per_img=-1)[source]

Postprocess detections from YOLOX detection head.

Creates an instance of the class.

Parameters:
  • point_generator (MlvlPointGenerator) – Point generator.

  • box_decoder (YOLOXBBoxDecoder) – Box decoder.

  • nms_threshold (float, optional) – IoU threshold for NMS. Defaults to 0.65.

  • score_thr (float, optional) – Score threshold to filter detections. Defaults to 0.01.

  • nms_pre (int, optional) – Number of topk results before NMS. Defaults to -1 (all).

  • max_per_img (int, optional) – Number of topk results after NMS. Defaults to -1 (all).

__call__(cls_outs, reg_outs, obj_outs, images_hw)[source]

Type definition for function call.

Return type:

DetOut

forward(cls_outs, reg_outs, obj_outs, images_hw)[source]

Forward pass.

Parameters:
  • cls_outs (list[torch.Tensor]) – [N, C, H, W] per scale.

  • reg_outs (list[torch.Tensor]) – [N, 4, H, W] per scale.

  • obj_outs (list[torch.Tensor]) – [N, 1, H, W] per scale.

  • images_hw (list[tuple[int, int]]) – list of image sizes.

Returns:

Detection outputs.

Return type:

DetOut

bbox_xyxy_to_cxcywh(bbox)[source]

Convert bbox coordinates from (x1, y1, x2, y2) to (cx, cy, w, h).

Parameters:

bbox (Tensor) – Shape (n, 4) for bboxes.

Returns:

Converted bboxes.

Return type:

Tensor

bboxes_nms(cls_scores, bboxes, objectness, nms_threshold=0.65, score_thr=0.01, nms_pre=-1, max_per_img=-1)[source]

Decode box energies into detections for a single image.

Detections are post-processed via NMS. NMS is performed per level. Afterwards, select topk detections.

Parameters:
  • cls_scores (torch.Tensor) – topk class scores per level.

  • bboxes (torch.Tensor) – topk class labels per level.

  • objectness (torch.Tensor) – topk regression params per level.

  • nms_threshold (float, optional) – iou threshold for NMS. Defaults to 0.65.

  • score_thr (float, optional) – score threshold to filter detections. Defaults to 0.01.

  • nms_pre (int, optional) – number of topk results before NMS. Defaults to -1 (all).

  • max_per_img (int, optional) – number of topk results after NMS. Defaults to -1 (all).

Returns:

decoded boxes, scores,

and labels.

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

get_default_point_generator()[source]

Get default point generator.

Return type:

MlvlPointGenerator

get_l1_target(bbox_target, priors, eps=1e-08)[source]

Convert gt bboxes to center offset and log width height.

Parameters:
  • bbox_target (Tensor) – Shape (n, 4) for ground-truth bboxes.

  • priors (Tensor) – Shape (n, 4) for prior boxes.

  • eps (float, optional) – Epsilon for numerical stability. Defaults to 1e-8.

Return type:

Tensor

preprocess_outputs(cls_outs, reg_outs, obj_outs, images_hw, point_generator, box_decoder)[source]

Preprocess model outputs before postprocessing/loss computation.

Parameters:
  • cls_outs (list[torch.Tensor]) – [N, C, H, W] per scale.

  • reg_outs (list[torch.Tensor]) – [N, 4, H, W] per scale.

  • obj_outs (list[torch.Tensor]) – [N, 1, H, W] per scale.

  • images_hw (list[tuple[int, int]]) – List of image sizes.

  • point_generator (MlvlPointGenerator) – Point generator.

  • box_decoder (YOLOXBBoxDecoder) – Box decoder.

Returns:

Flattened outputs.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor]