vis4d.op.layer¶

Init layers module.

class Attention(dim, num_heads=8, qkv_bias=False, attn_drop=0.0, proj_drop=0.0)[source]¶

ViT Attention Layer.

Modified from timm (https://github.com/huggingface/pytorch-image-models).

Init attention layer.

Parameters:

dim (int) – Input tensor’s dimension.
num_heads (int, optional) – Number of attention heads. Defaults to 8.
qkv_bias (bool, optional) – If to add bias to qkv. Defaults to False.
attn_drop (float, optional) – Dropout rate for attention. Defaults to 0.0.
proj_drop (float, optional) – Dropout rate for projection. Defaults to 0.0.

__call__(data)[source]¶

Applies the layer.

Parameters:: data (Tensor) – Input tensor of shape (B, N, dim).
Returns:: Output tensor of the same shape as input.
Return type:: Tensor

forward(x)[source]¶

Forward pass.

Return type:: Tensor

class CSPLayer(in_channels, out_channels, expand_ratio=0.5, num_blocks=1, add_identity=True)[source]¶

Cross Stage Partial Layer.

Parameters:

in_channels (int) – The input channels of the CSP layer.
out_channels (int) – The output channels of the CSP layer.
expand_ratio (float, optional) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
num_blocks (int, optional) – Number of blocks. Defaults to 1.
add_identity (bool, optional) – Whether to add identity in blocks. Defaults to True.

Init.

forward(features)[source]¶

Forward pass.

Parameters:: features (torch.Tensor) – Input features.
Return type:: Tensor

class Conv2d(*args, norm=None, activation=None, **kwargs)[source]¶

Wrapper around Conv2d to support empty inputs and norm/activation.

Creates an instance of the class.

If norm is specified, it is initialized with 1.0 and bias with 0.0.

forward(x)[source]¶

Forward pass.

Return type:: Tensor

class DeformConv(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, norm=None, activation=None)[source]¶

Wrapper around Deformable Convolution operator with norm/activation.

If norm is specified, it is initialized with 1.0 and bias with 0.0.

Creates an instance of the class.

Parameters:

in_channels (int) – Input channels.
out_channels (int) – Output channels.
kernel_size (int) – Size of convolutional kernel.
stride (int, optional) – Stride of convolutional layer. Defaults to 1.
padding (int, optional) – Padding of convolutional layer. Defaults to 0.
dilation (int, optional) – Dilation of convolutional layer. Defaults to 1.
groups (int, optional) – Number of deformable groups. Defaults to 1.
bias (bool, optional) – Whether to use bias in convolutional layer. Defaults to True.
norm (nn.Module, optional) – Normalization layer. Defaults to None.
activation (nn.Module, optional) – Activation layer. Defaults to None.

forward(input_x)[source]¶

Forward.

Return type:: Tensor

init_weights()[source]¶

Initialize weights of offset conv layer.

Return type:: None

class DropPath(drop_prob=0.0, scale_by_keep=True)[source]¶

DropPath regularizer (Stochastic Depth) per sample.

Init DropPath.

Parameters:

drop_prob (float, optional) – Probability of an item to be masked. Defaults to 0.0.
scale_by_keep (bool, optional) – If to scale by keep probability. Defaults to True.

__call__(data)[source]¶

Applies the layer.

Parameters:: data (Tensor) – (tensor) input shape [N, …]
Return type:: Tensor

forward(x)[source]¶

Forward pass.

Return type:: Tensor

class PatchEmbed(img_size=224, patch_size=16, in_channels=3, embed_dim=768, norm_layer=None, flatten=True, bias=True)[source]¶

2D Image to Patch Embedding.

Init PatchEmbed.

Parameters:

img_size (int, optional) – Input image’s size. Defaults to 224.
patch_size (int, optional) – Patch size. Defaults to 16.
in_channels (int, optional) – Number of input image’s channels. Defaults to 3.
embed_dim (int, optional) – Patch embedding’s dim. Defaults to 768.
norm_layer (nn.Module, optional) – Normalization layer. Defaults to None, which means no normalization layer.
flatten (bool, optional) – If to flatten the output tensor. Defaults to True.
bias (bool, optional) – If to add bias to the convolution layer. Defaults to True.

Raises:

ValueError – If the input image’s size is not divisible by the patch size.

__call__(data)[source]¶

Applies the layer.

Parameters:

data (torch.Tensor) – Input tensor of shape (B, C, H, W).

Returns:

Output tensor of shape (B, N, C), where N is the: number of patches (N = H * W).

Return type:

torch.Tensor

forward(x)[source]¶

Forward function.

Return type:: Tensor

class ResnetBlockFC(size_in, size_out=None, size_h=None)[source]¶

Fully connected ResNet Block consisting of two linear layers.

Parameters:

size_in (int) – (int) input dimension
size_out (Optional[int]) – Optional(int) output dimension, if not specified same as input
size_h (Optional[int]) – Optional(int) hidden dimension, if not specfied same as min(in,out)

__call__(data)[source]¶

Applies the layer.

Parameters:: data (Tensor) – (tensor) input shape [N, C]
Return type:: Tensor

forward(data)[source]¶

Applies the layer.

Parameters:: data (Tensor) – (tensor) input shape [N, C]
Return type:: Tensor

class TransformerBlock(dim, num_heads, mlp_ratio=4.0, qkv_bias=False, drop=0.0, attn_drop=0.0, init_values=None, drop_path=0.0, act_layer=GELU(approximate='none'), norm_layer=None)[source]¶

Transformer block for Vision Transformer.

Init transformer block.

Parameters:

dim (int) – Input tensor’s dimension.
num_heads (int) – Number of attention heads.
mlp_ratio (float, optional) – Ratio of MLP hidden dim to embedding dim. Defaults to 4.0.
qkv_bias (bool, optional) – If to add bias to qkv. Defaults to False.
drop (float, optional) – Dropout rate for attention and projection. Defaults to 0.0.
attn_drop (float, optional) – Dropout rate for attention. Defaults to 0.0.
init_values (tuple[float, float] | None, optional) – Initial values for layer scale. Defaults to None.
drop_path (float, optional) – Dropout rate for drop path. Defaults to 0.0.
act_layer (nn.Module, optional) – Activation layer. Defaults to nn.GELU.
norm_layer (nn.Module, optional) – Normalization layer. If None, use nn.LayerNorm.

__call__(data)[source]¶

Forward pass.

Parameters:: data (torch.Tensor) – Input tensor of shape (B, N, dim).
Returns:: Output tensor of shape (B, N, dim).
Return type:: torch.Tensor

forward(x)[source]¶

Forward pass.

Return type:: Tensor

class TransformerBlockMLP(in_features, hidden_features=None, out_features=None, act_layer=GELU(approximate='none'), bias=True, drop=0.0)[source]¶

MLP as used in Vision Transformer, MLP-Mixer and related networks.

Init MLP.

Parameters:

in_features (int) – Number of input features.
hidden_features (int, optional) – Number of hidden features. Defaults to None.
out_features (int, optional) – Number of output features. Defaults to None.
act_layer (nn.Module, optional) – Activation layer. Defaults to nn.GELU.
bias (bool, optional) – If bias should be used. Defaults to True.
drop (float, optional) – Dropout probability. Defaults to 0.0.

__call__(data)[source]¶

Applies the layer.

Parameters:: data (Tensor) – (tensor) input shape [N, C]
Return type:: Tensor

forward(x)[source]¶

Forward pass.

Parameters:: x (Tensor) – (tensor) input shape [N, C]
Return type:: Tensor

class UnetDownConv(in_channels, out_channels, pooling=True, activation='ReLU')[source]¶

Downsamples a feature map by applying two convolutions and maxpool.

Creates a new downsampling convolution operator.

This operator consists of two convolutions followed by a maxpool operator.

Parameters:

in_channels (int) – input channesl
out_channels (int) – output channesl
pooling (bool) – If pooling should be applied
activation (str) – Activation that should be applied

__call__(data)[source]¶

Applies the operator.

Parameters:

data (Tensor) – Input data.

Returns:

Containing the features before the pooling: operation (features) and after (pooled_features).

Return type:

UnetDownConvOut

forward(data)[source]¶

Applies the operator.

Parameters:

data (Tensor) – Input data.

Returns:

containing the features before the pooling: operation (features) and after (pooled_features).

Return type:

UnetDownConvOut

class UnetUpConv(in_channels, out_channels, merge_mode='concat', up_mode='transpose')[source]¶

An operator that performs 2 convolutions and 1 UpConvolution.

A ReLU activation follows each convolution.

Creates a new UpConv operator.

This operator merges two inputs by upsampling one and combining it with the other.

Parameters:

in_channels (int) – Number of input channels (low res)
out_channels (int) – Number of output channels (high res)
merge_mode (str) – How to merge both input channels
up_mode (str) – How to upsample the channel with lower resolution

Raises:

ValueError – If upsampling mode is unknown

__call__(from_down, from_up)[source]¶

Forward pass.

Parameters:

from_down (Tensor) – Tensor from the encoder pathway. Assumed to have dimension ‘out_channels’
from_up (Tensor) – Upconv’d tensor from the decoder pathway. Assumed to have dimension ‘in_channels’

Return type:

Tensor

forward(from_down, from_up)[source]¶

Forward pass.

Parameters:

from_down (Tensor) – Tensor from the encoder pathway. Assumed to have dimension ‘out_channels’
from_up (Tensor) – Upconv’d tensor from the decoder pathway. Assumed to have dimension ‘in_channels’

Return type:

Tensor

add_conv_branch(num_branch_convs, last_layer_dim, conv_out_dim, conv_has_bias, norm_cfg, num_groups)[source]¶

Init conv branch for head.

Return type:: tuple[ModuleList, int]

Modules

`vis4d.op.layer.attention`	Attention layer.
`vis4d.op.layer.conv2d`	Wrapper for conv2d.
`vis4d.op.layer.csp_layer`	Cross Stage Partial Layer.
`vis4d.op.layer.deform_conv`	Wrapper for deformable convolution.
`vis4d.op.layer.drop`	DropPath (Stochastic Depth) regularization layers.
`vis4d.op.layer.mlp`	MLP Layers.
`vis4d.op.layer.ms_deform_attn`	Multi-Scale Deformable Attention Module.
`vis4d.op.layer.patch_embed`	Image to Patch Embedding using Conv2d.
`vis4d.op.layer.positional_encoding`	Positional encoding for transformer.
`vis4d.op.layer.transformer`	Transformer layer.
`vis4d.op.layer.util`	Utility functions for layer ops.
`vis4d.op.layer.weight_init`	Model weight initialization.