vis4d.op.detect3d.bevformer.bevformer

BEVFormer head.

Functions

bbox3d2result(bbox_list, lidar2global)

Convert BEVFormer detection results to Detect3DOut.

Classes

BEVFormerHead([num_classes, embed_dims, ...])

BEVFormer 3D detection head.

class BEVFormerHead(num_classes=10, embed_dims=256, num_query=900, transformer=None, num_reg_fcs=2, num_cls_fcs=2, point_cloud_range=(-51.2, -51.2, -5.0, 51.2, 51.2, 3.0), bev_h=200, bev_w=200)[source]

BEVFormer 3D detection head.

Initialize BEVFormerHead.

Parameters:
  • num_classes (int, optional) – Number of classes. Defaults to 10.

  • embed_dims (int, optional) – Embedding dimensions. Defaults to 256.

  • num_query (int, optional) – Number of queries. Defaults to 900.

  • transformer (PerceptionTransformer, optional) – Transformer. Defaults to None. If None, a default transformer will be created.

  • num_reg_fcs (int, optional) – Number of fully connected layers in regression branch. Defaults to 2.

  • num_cls_fcs (int, optional) – Number of fully connected layers in classification branch. Defaults to 2.

  • point_cloud_range (Sequence[float], optional) – Point cloud range. Defaults to (-51.2, -51.2, -5.0, 51.2, 51.2, 3.0).

  • bev_h (int, optional) – BEV height. Defaults to 200.

  • bev_w (int, optional) – BEV width. Defaults to 200.

__call__(mlvl_feats, can_bus, images_hw, cam_intrinsics, cam_extrinsics, lidar_extrinsics, prev_bev=None)[source]

Type definition.

Return type:

tuple[Detect3DOut, Tensor]

forward(mlvl_feats, can_bus, images_hw, cam_intrinsics, cam_extrinsics, lidar_extrinsics, prev_bev=None)[source]

Forward function.

Parameters:
  • mlvl_feats (list[Tensor]) – Features from the upstream network, each is with shape (B, N, C, H, W).

  • can_bus (Tensor) – CAN bus data, with shape (B, 18).

  • images_hw (tuple[int, int]) – Image height and width.

  • cam_intrinsics (list[Tensor]) – Camera intrinsics.

  • cam_extrinsics (list[Tensor]) – Camera extrinsics.

  • lidar_extrinsics (list[Tensor]) – LiDAR extrinsics.

  • prev_bev (Tensor, optional) – Previous BEV feature map, with shape (B, C, H, W). Defaults to None.

Returns:

Detection results and BEV feature map.

Return type:

tuple[Detect3DOut, Tensor]

bbox3d2result(bbox_list, lidar2global)[source]

Convert BEVFormer detection results to Detect3DOut.

Parameters:
  • bbox_list (list[tuple[Tensor, Tensor, Tensor]) – List of bounding boxes, scores and labels.

  • lidar2global (Tensor) – Lidar to global transformation (B, 4, 4).

Returns:

Detection results.

Return type:

Detect3DOut