vis4d.op.layer.positional_encoding¶

Positional encoding for transformer.

Modified from mmdetection (https://github.com/open-mmlab/mmdetection).

Classes

`LearnedPositionalEncoding`(num_feats[, ...])	Position embedding with learnable embedding weights.
`SinePositionalEncoding`(num_feats[, ...])	Position encoding with sine and cosine functions.
`SinePositionalEncoding3D`(num_feats[, ...])	3D Position encoding with sine and cosine functions.

class LearnedPositionalEncoding(num_feats, row_num_embed=50, col_num_embed=50)[source]¶

Position embedding with learnable embedding weights.

Initialization for LearnedPositionalEncoding.

Parameters:

num_feats (int) – The feature dimension for each position along x-axis or y-axis. The final returned dimension for each position is 2 times of this value.
row_num_embed (int, optional) – The dictionary size of row embeddings. Defaults to 50.
col_num_embed (int, optional) – The dictionary size of col embeddings. Defaults to 50.

forward(mask)[source]¶

Forward function for LearnedPositionalEncoding.

Parameters:

mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, h, w].

Returns:

Returned position embedding with shape: [bs, num_feats*2, h, w].

Return type:

pos (Tensor)

init_weights()[source]¶

Initialize the weights of position embedding.

Return type:: None

class SinePositionalEncoding(num_feats, temperature=10000, normalize=False, scale=6.283185307179586, eps=1e-06, offset=0.0)[source]¶

Position encoding with sine and cosine functions.

See End-to-End Object Detection with Transformers for details.

Initialization for SinePositionalEncoding.

Parameters:

num_feats (int) – The feature dimension for each position along x-axis or y-axis. Note the final returned dimension for each position is 2 times of this value.
temperature (int, optional) – The temperature used for scaling the position embedding. Defaults to 10000.
normalize (bool, optional) – Whether to normalize the position embedding. Defaults to False.
scale (float, optional) – A scale factor that scales the position embedding. The scale will be used only when normalize is True. Defaults to 2*pi.
eps (float, optional) – A value added to the denominator for numerical stability. Defaults to 1e-6.
offset (float, optional) – offset add to embed when do the normalization. Defaults to 0.

forward(mask)[source]¶

Forward function for SinePositionalEncoding.

Parameters:

mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, h, w].

Returns:

Returned position embedding with shape: [bs, num_feats*2, h, w].

Return type:

pos (Tensor)

class SinePositionalEncoding3D(num_feats, temperature=10000, normalize=False, scale=6.283185307179586, eps=1e-06, offset=0.0)[source]¶

3D Position encoding with sine and cosine functions.

Initialization for SinePositionalEncoding.

Parameters:

num_feats (int) – The feature dimension for each position along x-axis or y-axis. Note the final returned dimension for each position is 2 times of this value.
temperature (int, optional) – The temperature used for scaling the position embedding. Defaults to 10000.
normalize (bool, optional) – Whether to normalize the position embedding. Defaults to False.
scale (float, optional) – A scale factor that scales the position embedding. The scale will be used only when normalize is True. Defaults to 2*pi.
eps (float, optional) – A value added to the denominator for numerical stability. Defaults to 1e-6.
offset (float, optional) – offset add to embed when do the normalization. Defaults to 0.

forward(mask)[source]¶

Forward function for SinePositionalEncoding3D.

Parameters:

mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, t, h, w].

Returns:

Returned position embedding with shape: [bs, num_feats*2, h, w].

Return type:

pos (Tensor)