vis4d.op.layer.attention¶

Attention layer.

Classes

`Attention`(dim[, num_heads, qkv_bias, ...])	ViT Attention Layer.
`MultiheadAttention`(embed_dims, num_heads[, ...])	A wrapper for `torch.nn.MultiheadAttention`.

class Attention(dim, num_heads=8, qkv_bias=False, attn_drop=0.0, proj_drop=0.0)[source]¶

ViT Attention Layer.

Init attention layer.

Parameters:

dim (int) – Input tensor’s dimension.
num_heads (int, optional) – Number of attention heads. Defaults to 8.
qkv_bias (bool, optional) – If to add bias to qkv. Defaults to False.
attn_drop (float, optional) – Dropout rate for attention. Defaults to 0.0.
proj_drop (float, optional) – Dropout rate for projection. Defaults to 0.0.

__call__(data)[source]¶

Applies the layer.

Forward pass.

class MultiheadAttention(embed_dims, num_heads, attn_drop=0.0, proj_drop=0.0, dropout_layer=None, batch_first=False, **kwargs)[source]¶

A wrapper for torch.nn.MultiheadAttention.

This module implements MultiheadAttention with identity connection, and positional encoding is also passed as input.

Init MultiheadAttention.

Parameters:

embed_dims (int) – The embedding dimension.
num_heads (int) – Parallel attention heads.
attn_drop (float) – A Dropout layer on attn_output_weights. Default: 0.0.
proj_drop (float) – A Dropout layer after nn.MultiheadAttention. Default: 0.0.
dropout_layer (nn.Module | None, optional) – The dropout_layer used when adding the shortcut. Defaults to None.
batch_first (bool) – When it is True, Key, Query and Value are shape of (batch, n, embed_dim), otherwise (n, batch, embed_dim). Default to False.

forward(query, key=None, value=None, identity=None, query_pos=None, key_pos=None, attn_mask=None, key_padding_mask=None)[source]¶

Forward function for MultiheadAttention.

**kwargs allow passing a more general data flow when combining with other operations in transformerlayer.

Parameters:

query (Tensor) – The input query with shape [num_queries, bs, embed_dims] if self.batch_first is False, else [bs, num_queries embed_dims].
key (Tensor) – The key tensor with shape [num_keys, bs, embed_dims] if self.batch_first is False, else [bs, num_keys, embed_dims] . If None, the query will be used. Defaults to None.
value (Tensor) – The value tensor with same shape as key. Same in nn.MultiheadAttention.forward. Defaults to None. If None, the key will be used.
identity (Tensor) – This tensor, with the same shape as x, will be used for the identity link. If None, x will be used. Defaults to None.
query_pos (Tensor) – The positional encoding for query, with the same shape as x. If not None, it will be added to x before forward function. Defaults to None.
key_pos (Tensor) – The positional encoding for key, with the same shape as key. Defaults to None. If not None, it will be added to key before forward function. If None, and query_pos has the same shape as key, then query_pos will be used for key_pos. Defaults to None.
attn_mask (Tensor) – ByteTensor mask with shape [num_queries, num_keys]. Same in nn.MultiheadAttention.forward. Defaults to None.
key_padding_mask (Tensor) – ByteTensor with shape [bs, num_keys]. Defaults to None.

Returns:

forwarded results with shape [num_queries, bs, embed_dims]: if self.batch_first is False, else [bs, num_queries, embed_dims].

Return type:

Tensor