vis4d.op.layer.positional_encoding¶
Positional encoding for transformer.
Modified from mmdetection (https://github.com/open-mmlab/mmdetection).
Classes
|
Position embedding with learnable embedding weights. |
|
Position encoding with sine and cosine functions. |
|
3D Position encoding with sine and cosine functions. |
- class LearnedPositionalEncoding(num_feats, row_num_embed=50, col_num_embed=50)[source]¶
Position embedding with learnable embedding weights.
Initialization for LearnedPositionalEncoding.
- Parameters:
num_feats (int) – The feature dimension for each position along x-axis or y-axis. The final returned dimension for each position is 2 times of this value.
row_num_embed (int, optional) – The dictionary size of row embeddings. Defaults to 50.
col_num_embed (int, optional) – The dictionary size of col embeddings. Defaults to 50.
- forward(mask)[source]¶
Forward function for LearnedPositionalEncoding.
- Parameters:
mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, h, w].
- Returns:
- Returned position embedding with shape
[bs, num_feats*2, h, w].
- Return type:
pos (Tensor)
- class SinePositionalEncoding(num_feats, temperature=10000, normalize=False, scale=6.283185307179586, eps=1e-06, offset=0.0)[source]¶
Position encoding with sine and cosine functions.
See End-to-End Object Detection with Transformers for details.
Initialization for SinePositionalEncoding.
- Parameters:
num_feats (int) – The feature dimension for each position along x-axis or y-axis. Note the final returned dimension for each position is 2 times of this value.
temperature (int, optional) – The temperature used for scaling the position embedding. Defaults to 10000.
normalize (bool, optional) – Whether to normalize the position embedding. Defaults to False.
scale (float, optional) – A scale factor that scales the position embedding. The scale will be used only when normalize is True. Defaults to 2*pi.
eps (float, optional) – A value added to the denominator for numerical stability. Defaults to 1e-6.
offset (float, optional) – offset add to embed when do the normalization. Defaults to 0.
- forward(mask)[source]¶
Forward function for SinePositionalEncoding.
- Parameters:
mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, h, w].
- Returns:
- Returned position embedding with shape
[bs, num_feats*2, h, w].
- Return type:
pos (Tensor)
- class SinePositionalEncoding3D(num_feats, temperature=10000, normalize=False, scale=6.283185307179586, eps=1e-06, offset=0.0)[source]¶
3D Position encoding with sine and cosine functions.
Initialization for SinePositionalEncoding.
- Parameters:
num_feats (int) – The feature dimension for each position along x-axis or y-axis. Note the final returned dimension for each position is 2 times of this value.
temperature (int, optional) – The temperature used for scaling the position embedding. Defaults to 10000.
normalize (bool, optional) – Whether to normalize the position embedding. Defaults to False.
scale (float, optional) – A scale factor that scales the position embedding. The scale will be used only when normalize is True. Defaults to 2*pi.
eps (float, optional) – A value added to the denominator for numerical stability. Defaults to 1e-6.
offset (float, optional) – offset add to embed when do the normalization. Defaults to 0.
- forward(mask)[source]¶
Forward function for SinePositionalEncoding3D.
- Parameters:
mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, t, h, w].
- Returns:
- Returned position embedding with shape
[bs, num_feats*2, h, w].
- Return type:
pos (Tensor)