vis4d.op.base.csp_darknet

CSP-Darknet base network used in YOLOX.

Modified from mmdetection (https://github.com/open-mmlab/mmdetection).

Classes

CSPDarknet([arch, deepen_factor, ...])

CSP-Darknet backbone used in YOLOv5 and YOLOX.

Focus(in_channels, out_channels[, ...])

Focus width and height information into channel space.

SPPBottleneck(in_channels, out_channels[, ...])

Spatial pyramid pooling layer used in YOLOv3-SPP.

class CSPDarknet(arch='P5', deepen_factor=1.0, widen_factor=1.0, out_indices=(2, 3, 4), frozen_stages=-1, arch_ovewrite=None, spp_kernal_sizes=(5, 9, 13), norm_eval=False)[source]

CSP-Darknet backbone used in YOLOv5 and YOLOX.

Parameters:
  • arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Default: P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Default: 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Default: False.

  • arch_ovewrite (list[list[int]], optional) – Overwrite default arch settings. Defaults to None.

  • spp_kernal_sizes (Sequence[int]) – (tuple[int]): Sequential of kernel sizes of SPP layers. Default: (5, 9, 13).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

Example

>>> import torch
>>> from vis4d.op.base import CSPDarknet
>>> self = CSPDarknet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)

Init.

forward(images)[source]

Forward pass.

Parameters:

images (torch.Tensor) – Input images.

Return type:

list[Tensor]

train(mode=True)[source]

Override the train mode for the model.

Parameters:

mode (bool) – Whether to set training mode to True.

Return type:

CSPDarknet

class Focus(in_channels, out_channels, kernel_size=1, stride=1)[source]

Focus width and height information into channel space.

Parameters:
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • kernel_size (int, optional) – The kernel size of the convolution. Defaults to 1.

  • stride (int, optional) – The stride of the convolution. Defaults to 1.

Init.

forward(features)[source]

Forward pass.

Parameters:

features (torch.Tensor) – The input tensor of shape [B, C, W, H].

Return type:

Tensor

class SPPBottleneck(in_channels, out_channels, kernel_sizes=(5, 9, 13))[source]

Spatial pyramid pooling layer used in YOLOv3-SPP.

Parameters:
  • in_channels (int) – Input channels.

  • out_channels (int) – Output channels.

  • kernel_sizes (Sequence[int], optional) – Sequential of kernel sizes of pooling layers. Defaults to (5, 9, 13).

Init.

forward(features)[source]

Forward pass.

Parameters:

features (torch.Tensor) – Input features.

Return type:

Tensor