vis4d.op.box.encoder.delta_xywh

XYWH Delta coder for 2D boxes.

Modified from mmdetection (https://github.com/open-mmlab/mmdetection).

Functions

bbox2delta(proposals, gt_boxes[, means, stds])

Compute deltas of proposals w.r.t.

delta2bbox(rois, deltas[, means, stds, ...])

Apply deltas to shift/scale base boxes.

Classes

DeltaXYWHBBoxDecoder([target_means, ...])

Delta XYWH BBox decoder.

DeltaXYWHBBoxEncoder([target_means, target_stds])

Delta XYWH BBox encoder.

class DeltaXYWHBBoxDecoder(target_means=(0.0, 0.0, 0.0, 0.0), target_stds=(1.0, 1.0, 1.0, 1.0), wh_ratio_clip=0.016)[source]

Delta XYWH BBox decoder.

Following the practice in R-CNN, it decodes delta (dx, dy, dw, dh) back to original bbox (x1, y1, x2, y2).

Creates an instance of the class.

Parameters:
  • target_means (tuple, optional) – Denormalizing means of target for delta coordinates. Defaults to (0.0, 0.0, 0.0, 0.0).

  • target_stds (tuple, optional) – Denormalizing standard deviation of target for delta coordinates. Defaults to (1.0, 1.0, 1.0, 1.0).

  • wh_ratio_clip (float, optional) – Maximum aspect ratio for boxes. Defaults to 16/1000.

__call__(boxes, box_deltas)[source]

Apply box offset energies box_deltas to boxes.

Parameters:
  • boxes (Tensor) – Basic boxes. Shape (B, N, 4) or (N, 4)

  • box_deltas (Tensor) – Encoded offsets with respect to each roi. Has shape (B, N, num_classes * 4) or (B, N, 4) or (N, num_classes * 4) or (N, 4). Note N = num_anchors * W * H when rois is a grid of anchors.Offset encoding follows [1].

Returns:

Decoded boxes.

Return type:

Tensor

class DeltaXYWHBBoxEncoder(target_means=(0.0, 0.0, 0.0, 0.0), target_stds=(1.0, 1.0, 1.0, 1.0))[source]

Delta XYWH BBox encoder.

Following the practice in R-CNN, it encodes bbox (x1, y1, x2, y2) into delta (dx, dy, dw, dh).

Creates an instance of the class.

Parameters:
  • target_means (tuple, optional) – Denormalizing means of target for delta coordinates. Defaults to (0.0, 0.0, 0.0, 0.0).

  • target_stds (tuple, optional) – Denormalizing standard deviation of target for delta coordinates. Defaults to (1.0, 1.0, 1.0, 1.0).

__call__(boxes, targets)[source]

Get box regression transformation deltas.

Used to transform target boxes into target regression parameters.

Parameters:
  • boxes (Tensor) – Source boxes, e.g., object proposals.

  • targets (Tensor) – Target of the transformation, e.g., ground-truth boxes.

Returns:

Box transformation deltas

Return type:

Tensor

bbox2delta(proposals, gt_boxes, means=(0.0, 0.0, 0.0, 0.0), stds=(1.0, 1.0, 1.0, 1.0))[source]

Compute deltas of proposals w.r.t. gt.

We usually compute the deltas of x, y, w, h of proposals w.r.t ground truth boxes to get regression target. This is the inverse function of delta2bbox().

Parameters:
  • proposals (Tensor) – Boxes to be transformed, shape (N, …, 4).

  • gt_boxes (Tensor) – Gt boxes to be used as base, shape (N, …, 4).

  • means (Sequence[float]) – Denormalizing means for delta coordinates.

  • stds (Sequence[float]) – Denormalizing standard deviation for delta coordinates.

Returns:

deltas with shape (N, 4), where columns represent dx, dy,

dw, dh.

Return type:

Tensor

delta2bbox(rois, deltas, means=(0.0, 0.0, 0.0, 0.0), stds=(1.0, 1.0, 1.0, 1.0), wh_ratio_clip=0.016)[source]

Apply deltas to shift/scale base boxes.

Typically the rois are anchor or proposed bounding boxes and the deltas are network outputs used to shift/scale those boxes. This is the inverse function of bbox2delta().

Parameters:
  • rois (Tensor) – Boxes to be transformed. Has shape (N, 4).

  • deltas (Tensor) – Encoded offsets relative to each roi. Has shape (N, num_classes * 4) or (N, 4). Note N = num_base_anchors * W * H, when rois is a grid of anchors. Offset encoding follows [1].

  • means (Sequence[float]) – Denormalizing means for delta coordinates. Default (0., 0., 0., 0.).

  • stds (Sequence[float]) – Denormalizing standard deviation for delta coordinates. Default (1., 1., 1., 1.).

  • wh_ratio_clip (float) – Maximum aspect ratio for boxes. Default 16 / 1000.

Returns:

Boxes with shape (N, num_classes * 4) or (N, 4), where 4

represent tl_x, tl_y, br_x, br_y.

Return type:

Tensor

References