vis4d.op.detect.yolox¶
YOLOX detection head.
Modified from mmdetection (https://github.com/open-mmlab/mmdetection).
Functions
|
Convert bbox coordinates from (x1, y1, x2, y2) to (cx, cy, w, h). |
|
Decode box energies into detections for a single image. |
Get default point generator. |
|
|
Convert gt bboxes to center offset and log width height. |
|
Preprocess model outputs before postprocessing/loss computation. |
Classes
|
YOLOX Head. |
|
Loss of YOLOX head. |
|
YOLOX head loss container. |
|
YOLOX head outputs. |
|
Postprocess detections from YOLOX detection head. |
- class YOLOXHead(num_classes, in_channels, feat_channels=256, stacked_convs=2, strides=(8, 16, 32), point_generator=None, box_decoder=None)[source]¶
YOLOX Head.
- Parameters:
num_classes (int) – Number of classes.
in_channels (int) – Number of input channels.
feat_channels (int, optional) – Number of feature channels. Defaults to 256.
stacked_convs (int, optional) – Number of stacked convolutions. Defaults to 2.
strides (Sequence[int], optional) – Strides for each feature level. Defaults to (8, 16, 32).
point_generator (MlvlPointGenerator, optional) – Point generator. Defaults to None.
box_decoder (YOLOXBBoxDecoder, optional) – Bounding box decoder. Defaults to None.
box_matcher (Matcher, optional) – Bounding box matcher. Defaults to None.
box_sampler (Sampler, optional) – Bounding box sampler. Defaults to None.
Creates an instance of the class.
- class YOLOXHeadLoss(num_classes, point_generator=None, box_decoder=None, loss_cls=<function binary_cross_entropy_with_logits>, loss_bbox=IoULoss(), loss_obj=<function binary_cross_entropy_with_logits>, loss_l1=None)[source]¶
Loss of YOLOX head.
Creates an instance of the class.
- Parameters:
num_classes (int) – Number of classes.
point_generator (MlvlPointGenerator) – Point generator.
box_decoder (YOLOXBBoxDecoder) – Box decoder.
loss_cls (TorchLossFunc, optional) – Classification loss function. Defaults to sigmoid_focal_loss.
loss_bbox (TorchLossFunc, optional) – Regression loss function. Defaults to l1_loss.
loss_obj (TorchLossFunc, optional) – Objectness loss function. Defaults to sigmoid_focal_loss.
loss_l1 (TorchLossFunc | None, optional) – L1 loss function. Defaults to None. Only used during the final few epochs.
- __call__(cls_outs, reg_outs, obj_outs, target_boxes, target_class_ids, images_hw)[source]¶
Type definition.
- Return type:
- forward(cls_outs, reg_outs, obj_outs, target_boxes, target_class_ids, images_hw)[source]¶
Compute YOLOX classification, regression, and objectness losses.
- Parameters:
cls_outs (list[Tensor]) – Network classification outputs at all scales.
reg_outs (list[Tensor]) – Network regression outputs at all scales.
obj_outs (list[Tensor]) – Network objectness outputs at all scales.
target_boxes (list[Tensor]) – Target bounding boxes.
images_hw (list[tuple[int, int]]) – Image dimensions without padding.
target_class_ids (list[Tensor]) – Target class labels.
- Returns:
YOLOX losses.
- Return type:
- class YOLOXHeadLosses(loss_cls: Tensor, loss_bbox: Tensor, loss_obj: Tensor, loss_l1: Tensor | None)[source]¶
YOLOX head loss container.
Create new instance of YOLOXHeadLosses(loss_cls, loss_bbox, loss_obj, loss_l1)
-
loss_bbox:
Tensor
¶ Alias for field number 1
-
loss_cls:
Tensor
¶ Alias for field number 0
-
loss_l1:
Tensor
|None
¶ Alias for field number 3
-
loss_obj:
Tensor
¶ Alias for field number 2
-
loss_bbox:
- class YOLOXOut(cls_score: list[torch.Tensor], bbox_pred: list[torch.Tensor], objectness: list[torch.Tensor])[source]¶
YOLOX head outputs.
Create new instance of YOLOXOut(cls_score, bbox_pred, objectness)
-
bbox_pred:
list
[Tensor
]¶ Alias for field number 1
-
cls_score:
list
[Tensor
]¶ Alias for field number 0
-
objectness:
list
[Tensor
]¶ Alias for field number 2
-
bbox_pred:
- class YOLOXPostprocess(point_generator, box_decoder, nms_threshold=0.65, score_thr=0.01, nms_pre=-1, max_per_img=-1)[source]¶
Postprocess detections from YOLOX detection head.
Creates an instance of the class.
- Parameters:
point_generator (MlvlPointGenerator) – Point generator.
box_decoder (YOLOXBBoxDecoder) – Box decoder.
nms_threshold (float, optional) – IoU threshold for NMS. Defaults to 0.65.
score_thr (float, optional) – Score threshold to filter detections. Defaults to 0.01.
nms_pre (int, optional) – Number of topk results before NMS. Defaults to -1 (all).
max_per_img (int, optional) – Number of topk results after NMS. Defaults to -1 (all).
- __call__(cls_outs, reg_outs, obj_outs, images_hw)[source]¶
Type definition for function call.
- Return type:
- forward(cls_outs, reg_outs, obj_outs, images_hw)[source]¶
Forward pass.
- Parameters:
cls_outs (list[torch.Tensor]) – [N, C, H, W] per scale.
reg_outs (list[torch.Tensor]) – [N, 4, H, W] per scale.
obj_outs (list[torch.Tensor]) – [N, 1, H, W] per scale.
images_hw (list[tuple[int, int]]) – list of image sizes.
- Returns:
Detection outputs.
- Return type:
- bbox_xyxy_to_cxcywh(bbox)[source]¶
Convert bbox coordinates from (x1, y1, x2, y2) to (cx, cy, w, h).
- Parameters:
bbox (Tensor) – Shape (n, 4) for bboxes.
- Returns:
Converted bboxes.
- Return type:
Tensor
- bboxes_nms(cls_scores, bboxes, objectness, nms_threshold=0.65, score_thr=0.01, nms_pre=-1, max_per_img=-1)[source]¶
Decode box energies into detections for a single image.
Detections are post-processed via NMS. NMS is performed per level. Afterwards, select topk detections.
- Parameters:
cls_scores (torch.Tensor) – topk class scores per level.
bboxes (torch.Tensor) – topk class labels per level.
objectness (torch.Tensor) – topk regression params per level.
nms_threshold (float, optional) – iou threshold for NMS. Defaults to 0.65.
score_thr (float, optional) – score threshold to filter detections. Defaults to 0.01.
nms_pre (int, optional) – number of topk results before NMS. Defaults to -1 (all).
max_per_img (int, optional) – number of topk results after NMS. Defaults to -1 (all).
- Returns:
- decoded boxes, scores,
and labels.
- Return type:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- get_l1_target(bbox_target, priors, eps=1e-08)[source]¶
Convert gt bboxes to center offset and log width height.
- Parameters:
bbox_target (Tensor) – Shape (n, 4) for ground-truth bboxes.
priors (Tensor) – Shape (n, 4) for prior boxes.
eps (float, optional) – Epsilon for numerical stability. Defaults to 1e-8.
- Return type:
Tensor
- preprocess_outputs(cls_outs, reg_outs, obj_outs, images_hw, point_generator, box_decoder)[source]¶
Preprocess model outputs before postprocessing/loss computation.
- Parameters:
cls_outs (list[torch.Tensor]) – [N, C, H, W] per scale.
reg_outs (list[torch.Tensor]) – [N, 4, H, W] per scale.
obj_outs (list[torch.Tensor]) – [N, 1, H, W] per scale.
images_hw (list[tuple[int, int]]) – List of image sizes.
point_generator (MlvlPointGenerator) – Point generator.
box_decoder (YOLOXBBoxDecoder) – Box decoder.
- Returns:
Flattened outputs.
- Return type:
tuple[Tensor, Tensor, Tensor, Tensor, Tensor]