vis4d.op.layer.attention

Attention layer.

Classes

Attention(dim[, num_heads, qkv_bias, ...])

ViT Attention Layer.

MultiheadAttention(embed_dims, num_heads[, ...])

A wrapper for torch.nn.MultiheadAttention.

class Attention(dim, num_heads=8, qkv_bias=False, attn_drop=0.0, proj_drop=0.0)[source]

ViT Attention Layer.

Modified from timm (https://github.com/huggingface/pytorch-image-models).

Init attention layer.

Parameters:
  • dim (int) – Input tensor’s dimension.

  • num_heads (int, optional) – Number of attention heads. Defaults to 8.

  • qkv_bias (bool, optional) – If to add bias to qkv. Defaults to False.

  • attn_drop (float, optional) – Dropout rate for attention. Defaults to 0.0.

  • proj_drop (float, optional) – Dropout rate for projection. Defaults to 0.0.

__call__(data)[source]

Applies the layer.

Parameters:

data (Tensor) – Input tensor of shape (B, N, dim).

Returns:

Output tensor of the same shape as input.

Return type:

Tensor

forward(x)[source]

Forward pass.

Return type:

Tensor

class MultiheadAttention(embed_dims, num_heads, attn_drop=0.0, proj_drop=0.0, dropout_layer=None, batch_first=False, **kwargs)[source]

A wrapper for torch.nn.MultiheadAttention.

This module implements MultiheadAttention with identity connection, and positional encoding is also passed as input.

Init MultiheadAttention.

Parameters:
  • embed_dims (int) – The embedding dimension.

  • num_heads (int) – Parallel attention heads.

  • attn_drop (float) – A Dropout layer on attn_output_weights. Default: 0.0.

  • proj_drop (float) – A Dropout layer after nn.MultiheadAttention. Default: 0.0.

  • dropout_layer (nn.Module | None, optional) – The dropout_layer used when adding the shortcut. Defaults to None.

  • batch_first (bool) – When it is True, Key, Query and Value are shape of (batch, n, embed_dim), otherwise (n, batch, embed_dim). Default to False.

forward(query, key=None, value=None, identity=None, query_pos=None, key_pos=None, attn_mask=None, key_padding_mask=None)[source]

Forward function for MultiheadAttention.

**kwargs allow passing a more general data flow when combining with other operations in transformerlayer.

Parameters:
  • query (Tensor) – The input query with shape [num_queries, bs, embed_dims] if self.batch_first is False, else [bs, num_queries embed_dims].

  • key (Tensor) – The key tensor with shape [num_keys, bs, embed_dims] if self.batch_first is False, else [bs, num_keys, embed_dims] . If None, the query will be used. Defaults to None.

  • value (Tensor) – The value tensor with same shape as key. Same in nn.MultiheadAttention.forward. Defaults to None. If None, the key will be used.

  • identity (Tensor) – This tensor, with the same shape as x, will be used for the identity link. If None, x will be used. Defaults to None.

  • query_pos (Tensor) – The positional encoding for query, with the same shape as x. If not None, it will be added to x before forward function. Defaults to None.

  • key_pos (Tensor) – The positional encoding for key, with the same shape as key. Defaults to None. If not None, it will be added to key before forward function. If None, and query_pos has the same shape as key, then query_pos will be used for key_pos. Defaults to None.

  • attn_mask (Tensor) – ByteTensor mask with shape [num_queries, num_keys]. Same in nn.MultiheadAttention.forward. Defaults to None.

  • key_padding_mask (Tensor) – ByteTensor with shape [bs, num_keys]. Defaults to None.

Returns:

forwarded results with shape [num_queries, bs, embed_dims]

if self.batch_first is False, else [bs, num_queries, embed_dims].

Return type:

Tensor