bboxes_nms(cls_scores, bboxes, objectness[, ...])

get_l1_target(bbox_target, priors[, eps])

preprocess_outputs(cls_outs, reg_outs, ...)

YOLOXHead(num_classes, in_channels[, ...])


YOLOXHeadLoss(num_classes[, ...])

YOLOXHeadLosses(loss_cls, loss_bbox, ...)

YOLOXOut(cls_score, bbox_pred, objectness)

YOLOXPostprocess(point_generator, box_decoder)

class YOLOXHead(num_classes, in_channels, feat_channels=256, stacked_convs=2, strides=(8, 16, 32), point_generator=None, box_decoder=None)[source]


  • num_classes (int) – Number of classes.

  • in_channels (int) – Number of input channels.

  • feat_channels (int, optional) – Number of feature channels. Defaults to 256.

  • stacked_convs (int, optional) – Number of stacked convolutions. Defaults to 2.

  • strides (Sequence[int], optional) – Strides for each feature level. Defaults to (8, 16, 32).

  • point_generator (MlvlPointGenerator, optional) – Point generator. Defaults to None.

  • box_decoder (YOLOXBBoxDecoder, optional) – Bounding box decoder. Defaults to None.

  • box_matcher (Matcher, optional) – Bounding box matcher. Defaults to None.

  • box_sampler (Sampler, optional) – Bounding box sampler. Defaults to None.

Return type:



Forward pass of YOLOX head.


features (list[torch.Tensor]) – Input features.


Classification, box, and objectness predictions.

Return type:


class YOLOXHeadLoss(num_classes, point_generator=None, box_decoder=None, loss_cls=<function binary_cross_entropy_with_logits>, loss_bbox=IoULoss(), loss_obj=<function binary_cross_entropy_with_logits>, loss_l1=None)[source]

  • num_classes (int) – Number of classes.

  • point_generator (MlvlPointGenerator) – Point generator.

  • box_decoder (YOLOXBBoxDecoder) – Box decoder.

  • loss_cls (TorchLossFunc, optional) – Classification loss function. Defaults to sigmoid_focal_loss.

  • loss_bbox (TorchLossFunc, optional) – Regression loss function. Defaults to l1_loss.

  • loss_obj (TorchLossFunc, optional) – Objectness loss function. Defaults to sigmoid_focal_loss.

  • loss_l1 (TorchLossFunc | None, optional) – L1 loss function. Defaults to None. Only used during the final few epochs.

__call__(cls_outs, reg_outs, obj_outs, target_boxes, target_class_ids, images_hw)[source]

Return type:


forward(cls_outs, reg_outs, obj_outs, target_boxes, target_class_ids, images_hw)[source]

Compute YOLOX classification, regression, and objectness losses.

  • cls_outs (list[Tensor]) – Network classification outputs at all scales.

  • reg_outs (list[Tensor]) – Network regression outputs at all scales.

  • obj_outs (list[Tensor]) – Network objectness outputs at all scales.

  • target_boxes (list[Tensor]) – Target bounding boxes.

  • images_hw (list[tuple[int, int]]) – Image dimensions without padding.

  • target_class_ids (list[Tensor]) – Target class labels.


YOLOX losses.

Return type:


class YOLOXHeadLosses(loss_cls: Tensor, loss_bbox: Tensor, loss_obj: Tensor, loss_l1: Tensor | None)[source]

loss_bbox: Tensor

loss_cls: Tensor

loss_l1: Tensor | None

loss_obj: Tensor

class YOLOXOut(cls_score: list[torch.Tensor], bbox_pred: list[torch.Tensor], objectness: list[torch.Tensor])[source]

bbox_pred: list[Tensor]

cls_score: list[Tensor]

objectness: list[Tensor]

class YOLOXPostprocess(point_generator, box_decoder, nms_threshold=0.65, score_thr=0.01, nms_pre=-1, max_per_img=-1)[source]

  • point_generator (MlvlPointGenerator) – Point generator.

  • box_decoder (YOLOXBBoxDecoder) – Box decoder.

  • nms_threshold (float, optional) – IoU threshold for NMS. Defaults to 0.65.

  • score_thr (float, optional) – Score threshold to filter detections. Defaults to 0.01.

  • nms_pre (int, optional) – Number of topk results before NMS. Defaults to -1 (all).

  • max_per_img (int, optional) – Number of topk results after NMS. Defaults to -1 (all).

__call__(cls_outs, reg_outs, obj_outs, images_hw)[source]

Return type:


forward(cls_outs, reg_outs, obj_outs, images_hw)[source]

Forward pass.

  • cls_outs (list[torch.Tensor]) – [N, C, H, W] per scale.

  • reg_outs (list[torch.Tensor]) – [N, 4, H, W] per scale.

  • obj_outs (list[torch.Tensor]) – [N, 1, H, W] per scale.

  • images_hw (list[tuple[int, int]]) – list of image sizes.


Detection outputs.

Return type:



bbox (Tensor) – Shape (n, 4) for bboxes.


Converted bboxes.

Return type:


bboxes_nms(cls_scores, bboxes, objectness, nms_threshold=0.65, score_thr=0.01, nms_pre=-1, max_per_img=-1)[source]

Detections are post-processed via NMS. NMS is performed per level. Afterwards, select topk detections.

  • cls_scores (torch.Tensor) – topk class scores per level.

  • bboxes (torch.Tensor) – topk class labels per level.

  • objectness (torch.Tensor) – topk regression params per level.

  • nms_threshold (float, optional) – iou threshold for NMS. Defaults to 0.65.

  • score_thr (float, optional) – score threshold to filter detections. Defaults to 0.01.

  • nms_pre (int, optional) – number of topk results before NMS. Defaults to -1 (all).

  • max_per_img (int, optional) – number of topk results after NMS. Defaults to -1 (all).


decoded boxes, scores,

and labels.

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]


Return type:


get_l1_target(bbox_target, priors, eps=1e-08)[source]

  • bbox_target (Tensor) – Shape (n, 4) for ground-truth bboxes.

  • priors (Tensor) – Shape (n, 4) for prior boxes.

  • eps (float, optional) – Epsilon for numerical stability. Defaults to 1e-8.

Return type:


preprocess_outputs(cls_outs, reg_outs, obj_outs, images_hw, point_generator, box_decoder)[source]

  • cls_outs (list[torch.Tensor]) – [N, C, H, W] per scale.

  • reg_outs (list[torch.Tensor]) – [N, 4, H, W] per scale.

  • obj_outs (list[torch.Tensor]) – [N, 1, H, W] per scale.

  • images_hw (list[tuple[int, int]]) – List of image sizes.

  • point_generator (MlvlPointGenerator) – Point generator.

  • box_decoder (YOLOXBBoxDecoder) – Box decoder.


Flattened outputs.

Return type:

tuple[Tensor, Tensor, Tensor, Tensor, Tensor]