vis4d.op.box.poolers.roi_pooler

Vis4D RoI Pooling module.

Classes

MultiScaleRoIAlign(sampling_ratio, *args, ...)

RoI Align supporting multi-scale inputs.

MultiScaleRoIPool(resolution, strides[, ...])

RoI Pool supporting multi-scale inputs.

MultiScaleRoIPooler(resolution, strides[, ...])

Wrapper for roi pooling that supports multi-scale feature maps.

class MultiScaleRoIAlign(sampling_ratio, *args, **kwargs)[source]

RoI Align supporting multi-scale inputs.

Creates an instance of the class.

class MultiScaleRoIPool(resolution, strides, canonical_box_size=224, canonical_level=4, aligned=True)[source]

RoI Pool supporting multi-scale inputs.

Multi-scale version of arbitrary RoI pooling operations.

Parameters:
  • resolution (tuple[int, int]) – Pooler resolution.

  • strides (list[int]) – Feature map strides relative to the input. The strides must be powers of 2 and a monotically decreasing geometric sequence with a factor of 1/2.

  • canonical_box_size (int) – Canonical box size in pixels (sqrt(box area)). The default is heuristically defined as 224 pixels in the FPN paper (based on ImageNet pre-training).

  • canonical_level (int) – The feature map level index from which a canonical sized box should be placed. The default is defined as level 4 (stride=16) in the FPN paper, i.e., a box of size 224x224 will be placed on the feature with stride=16. The box placement for all boxes will be determined from their sizes w.r.t canonical_box_size. For example, a box whose area is 4x that of a canonical box should be used to pool features from feature level canonical_level+1.

  • aligned (bool) – For roi_align op. Shift the box coordinates it by -0.5 for a better alignment with the two neighboring pixel indices.

class MultiScaleRoIPooler(resolution, strides, canonical_box_size=224, canonical_level=4, aligned=True)[source]

Wrapper for roi pooling that supports multi-scale feature maps.

Multi-scale version of arbitrary RoI pooling operations.

Parameters:
  • resolution (tuple[int, int]) – Pooler resolution.

  • strides (list[int]) – Feature map strides relative to the input. The strides must be powers of 2 and a monotically decreasing geometric sequence with a factor of 1/2.

  • canonical_box_size (int) – Canonical box size in pixels (sqrt(box area)). The default is heuristically defined as 224 pixels in the FPN paper (based on ImageNet pre-training).

  • canonical_level (int) – The feature map level index from which a canonical sized box should be placed. The default is defined as level 4 (stride=16) in the FPN paper, i.e., a box of size 224x224 will be placed on the feature with stride=16. The box placement for all boxes will be determined from their sizes w.r.t canonical_box_size. For example, a box whose area is 4x that of a canonical box should be used to pool features from feature level canonical_level+1.

  • aligned (bool) – For roi_align op. Shift the box coordinates it by -0.5 for a better alignment with the two neighboring pixel indices.

forward(features, boxes)[source]

Torchvision based roi pooling operation.

Parameters:
  • features (list[Tensor]) – List of image feature tensors (e.g., fpn levels) - NCHW format.

  • boxes (list[Tensor]) – List of proposals (per image).

Returns:

NCHW format, where N = num boxes (total),

HW is roi size, C is feature dim. Boxes are concatenated along dimension 0 for all batch elements.

Return type:

torch.Tensor