vis4d.op.base.unet

Unet Implementation based on https://arxiv.org/abs/1505.04597.

Code taken from https://github.com/jaxony/unet-pytorch/blob/master/model.py and modified to include typing and custom ops.

Classes

UNet(num_classes[, in_channels, depth, ...])

The U-Net is a convolutional encoder-decoder neural network.

UNetOut(logits, intermediate_features)

Output of the UNet operator.

class UNet(num_classes, in_channels=3, depth=5, start_filts=32, up_mode='transpose', merge_mode='concat')[source]

The U-Net is a convolutional encoder-decoder neural network.

Contextual spatial information (from the decoding, expansive pathway) about an input tensor is merged with information representing the localization of details (from the encoding, compressive pathway).

Modifications to the original paper: (1) padding is used in 3x3 convolutions to prevent loss

of border pixels

  1. merging outputs does not require cropping due to (1)

  2. residual connections can be used by specifying UNet(merge_mode=’add’)

  3. if non-parametric upsampling is used in the decoder pathway (specified by upmode=’upsample’), then an additional 1x1 2d convolution occurs after upsampling to reduce channel dimensionality by a factor of 2. This channel halving happens with the convolution in the tranpose convolution (specified by upmode=’transpose’)

Unet Operator.

Parameters:
  • in_channels (int) – int, number of channels in the input tensor. Default is 3 for RGB images.

  • num_classes (int) – int, number of output classes.

  • depth (int) – int, number of MaxPools in the U-Net.

  • start_filts (int) – int, number of convolutional filters for the first conv.

  • up_mode (str) – string, type of upconvolution. Choices: ‘transpose’ for transpose convolution or ‘upsample’ for nearest neighbour upsampling.

  • merge_mode (str) – string, how to merge features, can be ‘concat’ or ‘add’

Raises:

ValueError – if invalid modes are provided

__call__(data)[source]

Applies the UNet.

Parameters:

data (tensor) – Input Images into the network shape [N, C, W, H]

Return type:

UNetOut

forward(data)[source]

Applies the UNet.

Parameters:

data (tensor) – Input Images into the network shape [N, C, W, H]

Return type:

UNetOut

class UNetOut(logits: torch.Tensor, intermediate_features: list[torch.Tensor])[source]

Output of the UNet operator.

logits: Final output of the network without applying softmax intermediate_features: Intermediate features of the upsampling path

at different scales.

Create new instance of UNetOut(logits, intermediate_features)

intermediate_features: list[Tensor]

Alias for field number 1

logits: Tensor

Alias for field number 0