vis4d.op.detect3d.bevformer.transformer

BEVFormer transformer.

Classes

PerceptionTransformer([num_cams, encoder, ...])

Perception Transformer.

class PerceptionTransformer(num_cams=6, encoder=None, decoder=None, embed_dims=256, num_feature_levels=4, rotate_center=(100, 100))[source]

Perception Transformer.

Init.

forward(mlvl_feats, can_bus, bev_queries, object_query_embed, bev_h, bev_w, images_hw, cam_intrinsics, cam_extrinsics, lidar_extrinsics, grid_length, bev_pos, reg_branches, prev_bev=None)[source]

Forward function for BEVFormer transformer.

Parameters:
  • mlvl_feats (list(Tensor)) – Input queries from different level. Each element has shape [bs, num_cams, embed_dims, h, w].

  • can_bus (Tensor) – The can bus signals, has shape [bs, 18].

  • bev_queries (Tensor) – (bev_h * bev_w, embed_dims).

  • object_query_embed (Tensor) – The query embedding for decoder, with shape [num_query, embed_dims * 2].

  • bev_h (int) – The height of BEV feature map.

  • bev_w (int) – The width of BEV feature map.

  • images_hw (tuple[int, int]) – The height and width of images.

  • cam_intrinsics (list[Tensor]) – The camera intrinsics.

  • cam_extrinsics (list[Tensor]) – The camera extrinsics.

  • lidar_extrinsics (Tensor) – The lidar extrinsics.

  • grid_length (tuple[float, float]) – The length of grid in x and y direction.

  • bev_pos (Tensor) – (bs, embed_dims, bev_h, bev_w)

  • reg_branches (list[nn.Module]) – Regression heads for feature maps from each decoder layer.

  • prev_bev (Tensor, optional) – The previous BEV feature map, has shape [bev_h * bev_w, bs, embed_dims]. Defaults to None.

Returns:

BEV features has shape [bev_h *bev_w, bs,

embed_dims].

inter_states: Outputs from decoder has shape [1, bs, num_query,

embed_dims].

reference_points: As the initial reference has shape [bs,

num_queries, 4].

inter_references: The internal value of reference points in the

decoder, has shape [num_dec_layers, bs,num_query, embed_dims].

Return type:

bev_embed (Tensor)

get_bev_features(mlvl_feats, can_bus, bev_queries, bev_h, bev_w, images_hw, cam_intrinsics, cam_extrinsics, lidar_extrinsics, grid_length, bev_pos, prev_bev=None)[source]

Obtain bev features.

Return type:

Tensor