vis4d.op.detect.yolox¶

YOLOX detection head.

Modified from mmdetection (https://github.com/open-mmlab/mmdetection).

Functions

`bbox_xyxy_to_cxcywh`(bbox)	Convert bbox coordinates from (x1, y1, x2, y2) to (cx, cy, w, h).
`bboxes_nms`(cls_scores, bboxes, objectness[, ...])	Decode box energies into detections for a single image.
`get_default_point_generator`()	Get default point generator.
`get_l1_target`(bbox_target, priors[, eps])	Convert gt bboxes to center offset and log width height.
`preprocess_outputs`(cls_outs, reg_outs, ...)	Preprocess model outputs before postprocessing/loss computation.

Classes

`YOLOXHead`(num_classes, in_channels[, ...])	YOLOX Head.
`YOLOXHeadLoss`(num_classes[, ...])	Loss of YOLOX head.
`YOLOXHeadLosses`(loss_cls, loss_bbox, ...)	YOLOX head loss container.
`YOLOXOut`(cls_score, bbox_pred, objectness)	YOLOX head outputs.
`YOLOXPostprocess`(point_generator, box_decoder)	Postprocess detections from YOLOX detection head.

class YOLOXHead(num_classes, in_channels, feat_channels=256, stacked_convs=2, strides=(8, 16, 32), point_generator=None, box_decoder=None)[source]¶

YOLOX Head.

Parameters:

num_classes (int) – Number of classes.
in_channels (int) – Number of input channels.
feat_channels (int, optional) – Number of feature channels. Defaults to 256.
stacked_convs (int, optional) – Number of stacked convolutions. Defaults to 2.
strides (Sequence[int], optional) – Strides for each feature level. Defaults to (8, 16, 32).
point_generator (MlvlPointGenerator, optional) – Point generator. Defaults to None.
box_decoder (YOLOXBBoxDecoder, optional) – Bounding box decoder. Defaults to None.
box_matcher (Matcher, optional) – Bounding box matcher. Defaults to None.
box_sampler (Sampler, optional) – Bounding box sampler. Defaults to None.

Creates an instance of the class.

__call__(features)[source]¶

Type definition for call implementation.

Return type:: YOLOXOut

forward(features)[source]¶

Forward pass of YOLOX head.

Parameters:: features (list[torch.Tensor]) – Input features.
Returns:: Classification, box, and objectness predictions.
Return type:: YOLOXOut

class YOLOXHeadLoss(num_classes, point_generator=None, box_decoder=None, loss_cls=<function binary_cross_entropy_with_logits>, loss_bbox=IoULoss(), loss_obj=<function binary_cross_entropy_with_logits>, loss_l1=None)[source]¶

Loss of YOLOX head.

Creates an instance of the class.

Parameters:

num_classes (int) – Number of classes.
point_generator (MlvlPointGenerator) – Point generator.
box_decoder (YOLOXBBoxDecoder) – Box decoder.
loss_cls (TorchLossFunc, optional) – Classification loss function. Defaults to sigmoid_focal_loss.
loss_bbox (TorchLossFunc, optional) – Regression loss function. Defaults to l1_loss.
loss_obj (TorchLossFunc, optional) – Objectness loss function. Defaults to sigmoid_focal_loss.
loss_l1 (TorchLossFunc | None, optional) – L1 loss function. Defaults to None. Only used during the final few epochs.

__call__(cls_outs, reg_outs, obj_outs, target_boxes, target_class_ids, images_hw)[source]¶

Type definition.

Return type:: YOLOXHeadLosses

forward(cls_outs, reg_outs, obj_outs, target_boxes, target_class_ids, images_hw)[source]¶

Compute YOLOX classification, regression, and objectness losses.

Parameters:

cls_outs (list[Tensor]) – Network classification outputs at all scales.
reg_outs (list[Tensor]) – Network regression outputs at all scales.
obj_outs (list[Tensor]) – Network objectness outputs at all scales.
target_boxes (list[Tensor]) – Target bounding boxes.
images_hw (list[tuple[int, int]]) – Image dimensions without padding.
target_class_ids (list[Tensor]) – Target class labels.

Returns:

YOLOX losses.

Return type:

YOLOXHeadLosses

class YOLOXHeadLosses(loss_cls: Tensor, loss_bbox: Tensor, loss_obj: Tensor, loss_l1: Tensor | None)[source]¶

YOLOX head loss container.

Create new instance of YOLOXHeadLosses(loss_cls, loss_bbox, loss_obj, loss_l1)

loss_bbox: Tensor¶: Alias for field number 1

loss_cls: Tensor¶: Alias for field number 0

loss_l1: Tensor | None¶: Alias for field number 3

loss_obj: Tensor¶: Alias for field number 2

class YOLOXOut(cls_score: list[torch.Tensor], bbox_pred: list[torch.Tensor], objectness: list[torch.Tensor])[source]¶

YOLOX head outputs.

Create new instance of YOLOXOut(cls_score, bbox_pred, objectness)

bbox_pred: list[Tensor]¶: Alias for field number 1

cls_score: list[Tensor]¶: Alias for field number 0

objectness: list[Tensor]¶: Alias for field number 2

class YOLOXPostprocess(point_generator, box_decoder, nms_threshold=0.65, score_thr=0.01, nms_pre=-1, max_per_img=-1)[source]¶

Postprocess detections from YOLOX detection head.

Creates an instance of the class.

Parameters:

point_generator (MlvlPointGenerator) – Point generator.
box_decoder (YOLOXBBoxDecoder) – Box decoder.
nms_threshold (float, optional) – IoU threshold for NMS. Defaults to 0.65.
score_thr (float, optional) – Score threshold to filter detections. Defaults to 0.01.
nms_pre (int, optional) – Number of topk results before NMS. Defaults to -1 (all).
max_per_img (int, optional) – Number of topk results after NMS. Defaults to -1 (all).

__call__(cls_outs, reg_outs, obj_outs, images_hw)[source]¶

Type definition for function call.

Return type:: DetOut

forward(cls_outs, reg_outs, obj_outs, images_hw)[source]¶

Forward pass.

Parameters:

cls_outs (list[torch.Tensor]) – [N, C, H, W] per scale.
reg_outs (list[torch.Tensor]) – [N, 4, H, W] per scale.
obj_outs (list[torch.Tensor]) – [N, 1, H, W] per scale.
images_hw (list[tuple[int, int]]) – list of image sizes.

Returns:

Detection outputs.

Return type:

DetOut

bbox_xyxy_to_cxcywh(bbox)[source]¶

Convert bbox coordinates from (x1, y1, x2, y2) to (cx, cy, w, h).

Parameters:: bbox (Tensor) – Shape (n, 4) for bboxes.
Returns:: Converted bboxes.
Return type:: Tensor

bboxes_nms(cls_scores, bboxes, objectness, nms_threshold=0.65, score_thr=0.01, nms_pre=-1, max_per_img=-1)[source]¶

Decode box energies into detections for a single image.

Detections are post-processed via NMS. NMS is performed per level. Afterwards, select topk detections.

Parameters:

cls_scores (torch.Tensor) – topk class scores per level.
bboxes (torch.Tensor) – topk class labels per level.
objectness (torch.Tensor) – topk regression params per level.
nms_threshold (float, optional) – iou threshold for NMS. Defaults to 0.65.
score_thr (float, optional) – score threshold to filter detections. Defaults to 0.01.
nms_pre (int, optional) – number of topk results before NMS. Defaults to -1 (all).
max_per_img (int, optional) – number of topk results after NMS. Defaults to -1 (all).

Returns:

decoded boxes, scores,: and labels.

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

get_default_point_generator()[source]¶

Get default point generator.

Return type:: MlvlPointGenerator

get_l1_target(bbox_target, priors, eps=1e-08)[source]¶

Convert gt bboxes to center offset and log width height.

Parameters:

bbox_target (Tensor) – Shape (n, 4) for ground-truth bboxes.
priors (Tensor) – Shape (n, 4) for prior boxes.
eps (float, optional) – Epsilon for numerical stability. Defaults to 1e-8.

Return type:

Tensor

preprocess_outputs(cls_outs, reg_outs, obj_outs, images_hw, point_generator, box_decoder)[source]¶

Preprocess model outputs before postprocessing/loss computation.

Parameters:

cls_outs (list[torch.Tensor]) – [N, C, H, W] per scale.
reg_outs (list[torch.Tensor]) – [N, 4, H, W] per scale.
obj_outs (list[torch.Tensor]) – [N, 1, H, W] per scale.
images_hw (list[tuple[int, int]]) – List of image sizes.
point_generator (MlvlPointGenerator) – Point generator.
box_decoder (YOLOXBBoxDecoder) – Box decoder.

Returns:

Flattened outputs.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor]