vis4d.op.layer.transformer¶
Transformer layer.
Modified from timm (https://github.com/huggingface/pytorch-image-models) and mmdetection (https://github.com/open-mmlab/mmdetection).
Functions
|
Create N identical layers. |
|
Inverse function of sigmoid. |
Classes
|
Implements feed-forward networks (FFNs) with identity connection. |
|
Layer scaler. |
|
Transformer block for Vision Transformer. |
- class FFN(embed_dims=256, feedforward_channels=1024, num_fcs=2, dropout=0.0, activation='ReLU', inplace=True, dropout_layer=None, add_identity=True, layer_scale_init_value=0.0)[source]¶
Implements feed-forward networks (FFNs) with identity connection.
Init FFN.
- Parameters:
embed_dims (int) – The feature dimension. Defaults: 256.
feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 1024.
num_fcs (int) – The number of fully-connected layers in FFNs. Defaults: 2.
dropout (float) – The dropout rate of FFNs.
activation (str) – The activation function of FFNs.
inplace (bool) – Whether to set inplace for activation.
dropout_layer (nn.Module | None, optional) – The dropout_layer used when adding the shortcut. Defaults to None. If None, Identity is used.
add_identity (bool, optional) – Whether to add the identity connection. Default: True.
layer_scale_init_value (float) – Initial value of scale factor in LayerScale. Default: 0.0
- class LayerScale(dim, inplace=False, data_format='channels_last', init_values=1e-05)[source]¶
Layer scaler.
Init layer scaler.
- Parameters:
dim (int) – Input tensor’s dimension.
inplace (bool) – Whether performs operation in-place. Default: False.
data_format (str) – The input data format, could be ‘channels_last’ or ‘channels_first’, representing (B, C, H, W) and (B, N, C) format data respectively. Default: channels_last.
init_values (float, optional) – Initial values for layer scale. Defaults to 1e-5.
- class TransformerBlock(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, drop=0.0, attn_drop=0.0, init_values=None, drop_path=0.0, act_layer=GELU(approximate='none'), norm_layer=None)[source]¶
Transformer block for Vision Transformer.
Init transformer block.
- Parameters:
dim (int) – Input tensor’s dimension.
num_heads (int) – Number of attention heads.
mlp_ratio (float, optional) – Ratio of MLP hidden dim to embedding dim. Defaults to 4.0.
qkv_bias (bool, optional) – If to add bias to qkv. Defaults to False.
drop (float, optional) – Dropout rate for attention and projection. Defaults to 0.0.
attn_drop (float, optional) – Dropout rate for attention. Defaults to 0.0.
init_values (tuple[float, float] | None, optional) – Initial values for layer scale. Defaults to None.
drop_path (float, optional) – Dropout rate for drop path. Defaults to 0.0.
act_layer (nn.Module, optional) – Activation layer. Defaults to nn.GELU.
norm_layer (nn.Module, optional) – Normalization layer. If None, use nn.LayerNorm.