vis4d.op.base.resnet

Residual networks base model.

Modified from mmdetection (https://github.com/open-mmlab/mmdetection).

Classes

BasicBlock(inplanes, planes[, stride, ...])

BasicBlock.

Bottleneck(inplanes, planes[, stride, ...])

Bottleneck.

ResNet(resnet_name[, in_channels, ...])

ResNet BaseModel.

ResNetV1c(resnet_name[, pretrained, weights])

ResNetV1c variant with a deeper stem.

class BasicBlock(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', use_checkpoint=False, with_dcn=False, norm='BatchNorm2d')[source]

BasicBlock.

Creates an instance of the class.

forward(x)[source]

Forward function.

Return type:

Tensor

class Bottleneck(inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', use_checkpoint=False, with_dcn=False, norm='BatchNorm2d')[source]

Bottleneck.

Bottleneck block for ResNet.

If style is “pytorch”, the stride-two layer is the 3x3 conv layer, if it is “caffe”, the stride-two layer is the first 1x1 conv layer.

forward(x)[source]

Forward function.

Return type:

Tensor

class ResNet(resnet_name, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), style='pytorch', deep_stem=False, avg_down=False, trainable_layers=5, norm='BatchNorm2d', norm_frozen=True, stages_with_dcn=(False, False, False, False), replace_stride_with_dilation=(False, False, False), use_checkpoint=False, zero_init_residual=True, pretrained=False, weights=None)[source]

ResNet BaseModel.

Create ResNet.

Parameters:
  • resnet_name (str) – Name of the ResNet variant.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int | None) – Number of stem channels. If not specified, it will be the same as base_channels. Default: None.

  • base_channels (int) – Number of base channels of res layer. Default: 64.

  • num_stages (int) – Resnet stages. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1)

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: pytorch.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • trainable_layers (int, optional) – Number layers for training or fine-tuning. 5 means all the layers can be fine-tuned. Defaults to 5.

  • norm (str) – Normalization layer str. Default: BatchNorm2d, which means using nn.BatchNorm2d.

  • norm_frozen (bool) – Whether to set norm layers to eval mode. It freezes running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • stages_with_dcn (Sequence[bool]) – Indices of stages with deformable convolutions. Default: (False, False, False, False).

  • replace_stride_with_dilation (Sequence[bool]) – Whether to replace stride with dilation. Default: (False, False, False).

  • use_checkpoint (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • pretrained (bool) – Whether to load pretrained weights. Default: False.

  • weights (str, optional) – model pretrained path. Default: None

forward(images)[source]

Forward function.

Parameters:

images (Tensor[N, C, H, W]) – Image input to process. Expected to type float32 with values ranging 0..255.

Returns:

The output feature pyramid. The list index

represents the level, which has a downsampling raio of 2^index. fp[0] and fp[1] is a reference to the input images and torchvision resnet downsamples the feature maps by 4 directly. The last feature map downsamples the input image by 64 with a pooling layer on the second last map.

Return type:

fp (list[torch.Tensor])

train(mode=True)[source]

Override the train mode for the model.

Return type:

ResNet

property out_channels: list[int]

Get the number of channels for each level of feature pyramid.

Returns:

number of channels

Return type:

list[int]

class ResNetV1c(resnet_name, pretrained=False, weights=None, **kwargs)[source]

ResNetV1c variant with a deeper stem.

Compared with default ResNet, ResNetV1c replaces the 7x7 conv in the input stem with three 3x3 convs. For more details please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks <https://arxiv.org/abs/1812.01187>.

Initialize ResNetV1c.

Parameters:
  • resnet_name (str) – Name of the resnet model.

  • pretrained (bool, optional) – Whether to load ImageNet pre-trained weights. Defaults to False.

  • weights (str, optional) – Path to custom pretrained weights.

  • **kwargs (Any) – Arguments for ResNet.