vis4d.op.layer¶
Init layers module.
- class Attention(dim, num_heads=8, qkv_bias=False, attn_drop=0.0, proj_drop=0.0)[source]¶
ViT Attention Layer.
Modified from timm (https://github.com/huggingface/pytorch-image-models).
Init attention layer.
- Parameters:
dim (int) – Input tensor’s dimension.
num_heads (int, optional) – Number of attention heads. Defaults to 8.
qkv_bias (bool, optional) – If to add bias to qkv. Defaults to False.
attn_drop (float, optional) – Dropout rate for attention. Defaults to 0.0.
proj_drop (float, optional) – Dropout rate for projection. Defaults to 0.0.
- class CSPLayer(in_channels, out_channels, expand_ratio=0.5, num_blocks=1, add_identity=True)[source]¶
Cross Stage Partial Layer.
- Parameters:
in_channels (int) – The input channels of the CSP layer.
out_channels (int) – The output channels of the CSP layer.
expand_ratio (float, optional) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
num_blocks (int, optional) – Number of blocks. Defaults to 1.
add_identity (bool, optional) – Whether to add identity in blocks. Defaults to True.
Init.
- class Conv2d(*args, norm=None, activation=None, **kwargs)[source]¶
Wrapper around Conv2d to support empty inputs and norm/activation.
Creates an instance of the class.
If norm is specified, it is initialized with 1.0 and bias with 0.0.
- class DeformConv(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, norm=None, activation=None)[source]¶
Wrapper around Deformable Convolution operator with norm/activation.
If norm is specified, it is initialized with 1.0 and bias with 0.0.
Creates an instance of the class.
- Parameters:
in_channels (int) – Input channels.
out_channels (int) – Output channels.
kernel_size (int) – Size of convolutional kernel.
stride (int, optional) – Stride of convolutional layer. Defaults to 1.
padding (int, optional) – Padding of convolutional layer. Defaults to 0.
dilation (int, optional) – Dilation of convolutional layer. Defaults to 1.
groups (int, optional) – Number of deformable groups. Defaults to 1.
bias (bool, optional) – Whether to use bias in convolutional layer. Defaults to True.
norm (nn.Module, optional) – Normalization layer. Defaults to None.
activation (nn.Module, optional) – Activation layer. Defaults to None.
- class DropPath(drop_prob=0.0, scale_by_keep=True)[source]¶
DropPath regularizer (Stochastic Depth) per sample.
Init DropPath.
- Parameters:
drop_prob (float, optional) – Probability of an item to be masked. Defaults to 0.0.
scale_by_keep (bool, optional) – If to scale by keep probability. Defaults to True.
- class PatchEmbed(img_size=224, patch_size=16, in_channels=3, embed_dim=768, norm_layer=None, flatten=True, bias=True)[source]¶
2D Image to Patch Embedding.
Init PatchEmbed.
- Parameters:
img_size (int, optional) – Input image’s size. Defaults to 224.
patch_size (int, optional) – Patch size. Defaults to 16.
in_channels (int, optional) – Number of input image’s channels. Defaults to 3.
embed_dim (int, optional) – Patch embedding’s dim. Defaults to 768.
norm_layer (nn.Module, optional) – Normalization layer. Defaults to None, which means no normalization layer.
flatten (bool, optional) – If to flatten the output tensor. Defaults to True.
bias (bool, optional) – If to add bias to the convolution layer. Defaults to True.
- Raises:
ValueError – If the input image’s size is not divisible by the patch size.
- class ResnetBlockFC(size_in, size_out=None, size_h=None)[source]¶
Fully connected ResNet Block consisting of two linear layers.
Fully connected ResNet Block consisting of two linear layers.
- Parameters:
size_in (
int
) – (int) input dimensionsize_out (
Optional
[int
]) – Optional(int) output dimension, if not specified same as inputsize_h (
Optional
[int
]) – Optional(int) hidden dimension, if not specfied same as min(in,out)
- class TransformerBlock(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, drop=0.0, attn_drop=0.0, init_values=None, drop_path=0.0, act_layer=GELU(approximate='none'), norm_layer=None)[source]¶
Transformer block for Vision Transformer.
Init transformer block.
- Parameters:
dim (int) – Input tensor’s dimension.
num_heads (int) – Number of attention heads.
mlp_ratio (float, optional) – Ratio of MLP hidden dim to embedding dim. Defaults to 4.0.
qkv_bias (bool, optional) – If to add bias to qkv. Defaults to False.
drop (float, optional) – Dropout rate for attention and projection. Defaults to 0.0.
attn_drop (float, optional) – Dropout rate for attention. Defaults to 0.0.
init_values (tuple[float, float] | None, optional) – Initial values for layer scale. Defaults to None.
drop_path (float, optional) – Dropout rate for drop path. Defaults to 0.0.
act_layer (nn.Module, optional) – Activation layer. Defaults to nn.GELU.
norm_layer (nn.Module, optional) – Normalization layer. If None, use nn.LayerNorm.
- class TransformerBlockMLP(in_features, hidden_features=None, out_features=None, act_layer=GELU(approximate='none'), bias=True, drop=0.0)[source]¶
MLP as used in Vision Transformer, MLP-Mixer and related networks.
Init MLP.
- Parameters:
in_features (int) – Number of input features.
hidden_features (int, optional) – Number of hidden features. Defaults to None.
out_features (int, optional) – Number of output features. Defaults to None.
act_layer (nn.Module, optional) – Activation layer. Defaults to nn.GELU.
bias (bool, optional) – If bias should be used. Defaults to True.
drop (float, optional) – Dropout probability. Defaults to 0.0.
- class UnetDownConv(in_channels, out_channels, pooling=True, activation='ReLU')[source]¶
Downsamples a feature map by applying two convolutions and maxpool.
Creates a new downsampling convolution operator.
This operator consists of two convolutions followed by a maxpool operator.
- Parameters:
in_channels (int) – input channesl
out_channels (int) – output channesl
pooling (bool) – If pooling should be applied
activation (str) – Activation that should be applied
- __call__(data)[source]¶
Applies the operator.
- Parameters:
data (Tensor) – Input data.
- Returns:
- Containing the features before the pooling
operation (features) and after (pooled_features).
- Return type:
- class UnetUpConv(in_channels, out_channels, merge_mode='concat', up_mode='transpose')[source]¶
An operator that performs 2 convolutions and 1 UpConvolution.
A ReLU activation follows each convolution.
Creates a new UpConv operator.
This operator merges two inputs by upsampling one and combining it with the other.
- Parameters:
in_channels (
int
) – Number of input channels (low res)out_channels (
int
) – Number of output channels (high res)merge_mode (
str
) – How to merge both input channelsup_mode (
str
) – How to upsample the channel with lower resolution
- Raises:
ValueError – If upsampling mode is unknown
- add_conv_branch(num_branch_convs, last_layer_dim, conv_out_dim, conv_has_bias, norm_cfg, num_groups)[source]¶
Init conv branch for head.
- Return type:
tuple
[ModuleList
,int
]
Modules
Attention layer. |
|
Wrapper for conv2d. |
|
Cross Stage Partial Layer. |
|
Wrapper for deformable convolution. |
|
DropPath (Stochastic Depth) regularization layers. |
|
MLP Layers. |
|
Multi-Scale Deformable Attention Module. |
|
Image to Patch Embedding using Conv2d. |
|
Positional encoding for transformer. |
|
Transformer layer. |
|
Utility functions for layer ops. |
|
Model weight initialization. |