vis4d.op.layer.positional_encoding

Positional encoding for transformer.

Modified from mmdetection (https://github.com/open-mmlab/mmdetection).

Classes

LearnedPositionalEncoding(num_feats[, ...])

Position embedding with learnable embedding weights.

SinePositionalEncoding(num_feats[, ...])

Position encoding with sine and cosine functions.

SinePositionalEncoding3D(num_feats[, ...])

3D Position encoding with sine and cosine functions.

class LearnedPositionalEncoding(num_feats, row_num_embed=50, col_num_embed=50)[source]

Position embedding with learnable embedding weights.

Initialization for LearnedPositionalEncoding.

Parameters:
  • num_feats (int) – The feature dimension for each position along x-axis or y-axis. The final returned dimension for each position is 2 times of this value.

  • row_num_embed (int, optional) – The dictionary size of row embeddings. Defaults to 50.

  • col_num_embed (int, optional) – The dictionary size of col embeddings. Defaults to 50.

forward(mask)[source]

Forward function for LearnedPositionalEncoding.

Parameters:

mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, h, w].

Returns:

Returned position embedding with shape

[bs, num_feats*2, h, w].

Return type:

pos (Tensor)

init_weights()[source]

Initialize the weights of position embedding.

Return type:

None

class SinePositionalEncoding(num_feats, temperature=10000, normalize=False, scale=6.283185307179586, eps=1e-06, offset=0.0)[source]

Position encoding with sine and cosine functions.

See End-to-End Object Detection with Transformers for details.

Initialization for SinePositionalEncoding.

Parameters:
  • num_feats (int) – The feature dimension for each position along x-axis or y-axis. Note the final returned dimension for each position is 2 times of this value.

  • temperature (int, optional) – The temperature used for scaling the position embedding. Defaults to 10000.

  • normalize (bool, optional) – Whether to normalize the position embedding. Defaults to False.

  • scale (float, optional) – A scale factor that scales the position embedding. The scale will be used only when normalize is True. Defaults to 2*pi.

  • eps (float, optional) – A value added to the denominator for numerical stability. Defaults to 1e-6.

  • offset (float, optional) – offset add to embed when do the normalization. Defaults to 0.

forward(mask)[source]

Forward function for SinePositionalEncoding.

Parameters:

mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, h, w].

Returns:

Returned position embedding with shape

[bs, num_feats*2, h, w].

Return type:

pos (Tensor)

class SinePositionalEncoding3D(num_feats, temperature=10000, normalize=False, scale=6.283185307179586, eps=1e-06, offset=0.0)[source]

3D Position encoding with sine and cosine functions.

Initialization for SinePositionalEncoding.

Parameters:
  • num_feats (int) – The feature dimension for each position along x-axis or y-axis. Note the final returned dimension for each position is 2 times of this value.

  • temperature (int, optional) – The temperature used for scaling the position embedding. Defaults to 10000.

  • normalize (bool, optional) – Whether to normalize the position embedding. Defaults to False.

  • scale (float, optional) – A scale factor that scales the position embedding. The scale will be used only when normalize is True. Defaults to 2*pi.

  • eps (float, optional) – A value added to the denominator for numerical stability. Defaults to 1e-6.

  • offset (float, optional) – offset add to embed when do the normalization. Defaults to 0.

forward(mask)[source]

Forward function for SinePositionalEncoding3D.

Parameters:

mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, t, h, w].

Returns:

Returned position embedding with shape

[bs, num_feats*2, h, w].

Return type:

pos (Tensor)