vis4d.op.box.poolers.roi_pooler¶

Vis4D RoI Pooling module.

Classes

`MultiScaleRoIAlign`(sampling_ratio, *args, ...)	RoI Align supporting multi-scale inputs.
`MultiScaleRoIPool`(resolution, strides[, ...])	RoI Pool supporting multi-scale inputs.
`MultiScaleRoIPooler`(resolution, strides[, ...])	Wrapper for roi pooling that supports multi-scale feature maps.

class MultiScaleRoIAlign(sampling_ratio, *args, **kwargs)[source]¶

RoI Align supporting multi-scale inputs.

Creates an instance of the class.

class MultiScaleRoIPool(resolution, strides, canonical_box_size=224, canonical_level=4, aligned=True)[source]¶

RoI Pool supporting multi-scale inputs.

Multi-scale version of arbitrary RoI pooling operations.

Parameters:

resolution (tuple[int, int]) – Pooler resolution.
strides (list[int]) – Feature map strides relative to the input. The strides must be powers of 2 and a monotically decreasing geometric sequence with a factor of 1/2.
canonical_box_size (int) – Canonical box size in pixels (sqrt(box area)). The default is heuristically defined as 224 pixels in the FPN paper (based on ImageNet pre-training).
canonical_level (int) – The feature map level index from which a canonical sized box should be placed. The default is defined as level 4 (stride=16) in the FPN paper, i.e., a box of size 224x224 will be placed on the feature with stride=16. The box placement for all boxes will be determined from their sizes w.r.t canonical_box_size. For example, a box whose area is 4x that of a canonical box should be used to pool features from feature level canonical_level+1.
aligned (bool) – For roi_align op. Shift the box coordinates it by -0.5 for a better alignment with the two neighboring pixel indices.

class MultiScaleRoIPooler(resolution, strides, canonical_box_size=224, canonical_level=4, aligned=True)[source]¶

Wrapper for roi pooling that supports multi-scale feature maps.

Multi-scale version of arbitrary RoI pooling operations.

Parameters:

resolution (tuple[int, int]) – Pooler resolution.
strides (list[int]) – Feature map strides relative to the input. The strides must be powers of 2 and a monotically decreasing geometric sequence with a factor of 1/2.
canonical_box_size (int) – Canonical box size in pixels (sqrt(box area)). The default is heuristically defined as 224 pixels in the FPN paper (based on ImageNet pre-training).
canonical_level (int) – The feature map level index from which a canonical sized box should be placed. The default is defined as level 4 (stride=16) in the FPN paper, i.e., a box of size 224x224 will be placed on the feature with stride=16. The box placement for all boxes will be determined from their sizes w.r.t canonical_box_size. For example, a box whose area is 4x that of a canonical box should be used to pool features from feature level canonical_level+1.
aligned (bool) – For roi_align op. Shift the box coordinates it by -0.5 for a better alignment with the two neighboring pixel indices.

forward(features, boxes)[source]¶

Torchvision based roi pooling operation.

Parameters:

features (list[Tensor]) – List of image feature tensors (e.g., fpn levels) - NCHW format.
boxes (list[Tensor]) – List of proposals (per image).

Returns:

NCHW format, where N = num boxes (total),: HW is roi size, C is feature dim. Boxes are concatenated along dimension 0 for all batch elements.

Return type:

torch.Tensor