vis4d.op.layer.transformer¶

Transformer layer.

Modified from timm (https://github.com/huggingface/pytorch-image-models) and mmdetection (https://github.com/open-mmlab/mmdetection).

Functions

`get_clones`(module, num)	Create N identical layers.
`inverse_sigmoid`(x[, eps])	Inverse function of sigmoid.

Classes

`FFN`([embed_dims, feedforward_channels, ...])	Implements feed-forward networks (FFNs) with identity connection.
`LayerScale`(dim[, inplace, data_format, ...])	Layer scaler.
`TransformerBlock`(dim, num_heads[, ...])	Transformer block for Vision Transformer.

class FFN(embed_dims=256, feedforward_channels=1024, num_fcs=2, dropout=0.0, activation='ReLU', inplace=True, dropout_layer=None, add_identity=True, layer_scale_init_value=0.0)[source]¶

Implements feed-forward networks (FFNs) with identity connection.

Init FFN.

Parameters:

embed_dims (int) – The feature dimension. Defaults: 256.
feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 1024.
num_fcs (int) – The number of fully-connected layers in FFNs. Defaults: 2.
dropout (float) – The dropout rate of FFNs.
activation (str) – The activation function of FFNs.
inplace (bool) – Whether to set inplace for activation.
dropout_layer (nn.Module | None, optional) – The dropout_layer used when adding the shortcut. Defaults to None. If None, Identity is used.
add_identity (bool, optional) – Whether to add the identity connection. Default: True.
layer_scale_init_value (float) – Initial value of scale factor in LayerScale. Default: 0.0

forward(x, identity=None)[source]¶

Forward function for FFN.

The function would add x to the output tensor if residue is None.

Return type:: None

class LayerScale(dim, inplace=False, data_format='channels_last', init_values=1e-05)[source]¶

Layer scaler.

Init layer scaler.

Parameters:

dim (int) – Input tensor’s dimension.
inplace (bool) – Whether performs operation in-place. Default: False.
data_format (str) – The input data format, could be ‘channels_last’ or ‘channels_first’, representing (B, C, H, W) and (B, N, C) format data respectively. Default: channels_last.
init_values (float, optional) – Initial values for layer scale. Defaults to 1e-5.

forward(x)[source]¶

Forward pass.

Return type:: Tensor

class TransformerBlock(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, drop=0.0, attn_drop=0.0, init_values=None, drop_path=0.0, act_layer=GELU(approximate='none'), norm_layer=None)[source]¶

Transformer block for Vision Transformer.

Init transformer block.

Parameters:

dim (int) – Input tensor’s dimension.
num_heads (int) – Number of attention heads.
mlp_ratio (float, optional) – Ratio of MLP hidden dim to embedding dim. Defaults to 4.0.
qkv_bias (bool, optional) – If to add bias to qkv. Defaults to False.
drop (float, optional) – Dropout rate for attention and projection. Defaults to 0.0.
attn_drop (float, optional) – Dropout rate for attention. Defaults to 0.0.
init_values (tuple[float, float] | None, optional) – Initial values for layer scale. Defaults to None.
drop_path (float, optional) – Dropout rate for drop path. Defaults to 0.0.
act_layer (nn.Module, optional) – Activation layer. Defaults to nn.GELU.
norm_layer (nn.Module, optional) – Normalization layer. If None, use nn.LayerNorm.

__call__(data)[source]¶

Forward pass.

Parameters:: data (torch.Tensor) – Input tensor of shape (B, N, dim).
Returns:: Output tensor of shape (B, N, dim).
Return type:: torch.Tensor

forward(x)[source]¶

Forward pass.

Return type:: Tensor

get_clones(module, num)[source]¶

Create N identical layers.

Return type:: ModuleList

inverse_sigmoid(x, eps=1e-05)[source]¶

Inverse function of sigmoid.

Parameters:

x (Tensor) – The tensor to do the inverse.
eps (float) – EPS avoid numerical overflow. Defaults 1e-5.

Returns:

The x has passed the inverse function of sigmoid, has same: shape with input.

Return type:

Tensor