vis4d.op.detect3d.bevformer

BEVFormer ops.

class BEVFormerHead(num_classes=10, embed_dims=256, num_query=900, transformer=None, num_reg_fcs=2, num_cls_fcs=2, point_cloud_range=(-51.2, -51.2, -5.0, 51.2, 51.2, 3.0), bev_h=200, bev_w=200)[source]

BEVFormer 3D detection head.

Initialize BEVFormerHead.

Parameters:
  • num_classes (int, optional) – Number of classes. Defaults to 10.

  • embed_dims (int, optional) – Embedding dimensions. Defaults to 256.

  • num_query (int, optional) – Number of queries. Defaults to 900.

  • transformer (PerceptionTransformer, optional) – Transformer. Defaults to None. If None, a default transformer will be created.

  • num_reg_fcs (int, optional) – Number of fully connected layers in regression branch. Defaults to 2.

  • num_cls_fcs (int, optional) – Number of fully connected layers in classification branch. Defaults to 2.

  • point_cloud_range (Sequence[float], optional) – Point cloud range. Defaults to (-51.2, -51.2, -5.0, 51.2, 51.2, 3.0).

  • bev_h (int, optional) – BEV height. Defaults to 200.

  • bev_w (int, optional) – BEV width. Defaults to 200.

__call__(mlvl_feats, can_bus, images_hw, cam_intrinsics, cam_extrinsics, lidar_extrinsics, prev_bev=None)[source]

Type definition.

Return type:

tuple[Detect3DOut, Tensor]

forward(mlvl_feats, can_bus, images_hw, cam_intrinsics, cam_extrinsics, lidar_extrinsics, prev_bev=None)[source]

Forward function.

Parameters:
  • mlvl_feats (list[Tensor]) – Features from the upstream network, each is with shape (B, N, C, H, W).

  • can_bus (Tensor) – CAN bus data, with shape (B, 18).

  • images_hw (tuple[int, int]) – Image height and width.

  • cam_intrinsics (list[Tensor]) – Camera intrinsics.

  • cam_extrinsics (list[Tensor]) – Camera extrinsics.

  • lidar_extrinsics (list[Tensor]) – LiDAR extrinsics.

  • prev_bev (Tensor, optional) – Previous BEV feature map, with shape (B, C, H, W). Defaults to None.

Returns:

Detection results and BEV feature map.

Return type:

tuple[Detect3DOut, Tensor]

class GridMask(use_h, use_w, rotate=1, offset=False, ratio=0.5, mode=0, prob=1.0)[source]

Grid Mask Layer.

Init.

forward(x)[source]

Forward.

Return type:

Tensor

Modules

vis4d.op.detect3d.bevformer.bevformer

BEVFormer head.

vis4d.op.detect3d.bevformer.decoder

BEVFormer decoder.

vis4d.op.detect3d.bevformer.encoder

BEVFormer Encoder.

vis4d.op.detect3d.bevformer.grid_mask

Grid mask for BEVFormer.

vis4d.op.detect3d.bevformer.spatial_cross_attention

Spatial Cross Attention Module for BEVFormer.

vis4d.op.detect3d.bevformer.temporal_self_attention

An attention module used in BEVFormer based on Deformable-Detr.

vis4d.op.detect3d.bevformer.transformer

BEVFormer transformer.