vis4d.op.layer.patch_embed¶
Image to Patch Embedding using Conv2d.
Modified from vision_transformer (https://github.com/google-research/vision_transformer).
Classes
|
2D Image to Patch Embedding. |
- class PatchEmbed(img_size=224, patch_size=16, in_channels=3, embed_dim=768, norm_layer=None, flatten=True, bias=True)[source]¶
2D Image to Patch Embedding.
Init PatchEmbed.
- Parameters:
img_size (int, optional) – Input image’s size. Defaults to 224.
patch_size (int, optional) – Patch size. Defaults to 16.
in_channels (int, optional) – Number of input image’s channels. Defaults to 3.
embed_dim (int, optional) – Patch embedding’s dim. Defaults to 768.
norm_layer (nn.Module, optional) – Normalization layer. Defaults to None, which means no normalization layer.
flatten (bool, optional) – If to flatten the output tensor. Defaults to True.
bias (bool, optional) – If to add bias to the convolution layer. Defaults to True.
- Raises:
ValueError – If the input image’s size is not divisible by the patch size.