This guide provides a comprehensive reference for tensor shapes across different AI frameworks and data types.
Important
These are common conventions, not enforced rules
Note
- Batch dimension (B) is typically the first dimension when present
- Some frameworks allow flexible dimension ordering through configuration
- Shape conventions might vary based on specific functions or models within frameworks
- Many frameworks support both channel-first and channel-last formats with configuration options
- B: Batch size
- C: Channels
- T: Time steps / Sequence length
- H: Height
- W: Width
- D: Depth
- V: Vertices
- F: Faces
- S: Samples
- Fr: Frames
- M: Mel bands (for spectrograms)
| Framework | Image Shape | Batch Shape | Channel Ordering | Notes |
|---|---|---|---|---|
| NumPy | (H, W), (H, W, C) | (B, H, W, C) | RGB | C=1/3/4 for gray/RGB/RGBA |
| PyTorch | (C, H, W) | (B, C, H, W) | RGB | C=1/3/4 for gray/RGB/RGBA |
| TensorFlow | (H, W, C) | (B, H, W, C) | RGB | C=1/3/4 for gray/RGB/RGBA |
| Keras | (H, W, C) | (B, H, W, C) | RGB | Same as TF |
| OpenCV | (H, W), (H, W, C) | N/A | BGR | C=3 for BGR, grayscale is 2D |
| PIL | (H, W), (H, W, C) | N/A | RGB | size property is (W, H), but array is (H, W, C) |
| Matplotlib | (H, W), (H, W, C) | N/A | RGB | C=1/3/4 for gray/RGB/RGBA |
| Scikit-image | (H, W), (H, W, C) | N/A | RGB | C=1/3/4 for gray/RGB/RGBA |
| Framework | Raw Audio Shape | Spectrogram Shape | Notes |
|---|---|---|---|
| Librosa | (S,) | (M, Fr) | M=128 mel bands by default |
| Torchaudio | (C, S) | (C, M, Fr) | C=1 for mono, C=2 for stereo, M=128 mel bands by default |
| TensorFlow | (S,) | (Fr, F) | F=frequency bins (129 by default) |
| Soundfile | (S, C) | N/A | C=1 for mono, C=2 for stereo |
| Framework | Vertices Shape | Faces Shape | Batch Shape | Additional Attributes | Notes |
|---|---|---|---|---|---|
| PyTorch3D | (V, 3) | (F, 3) | (B, V, 3) | Textures: (F, H, W, 3) or (V, 3) Normals: (V, 3) UV coords: (V, 2) |
V = vertices, F = faces Supports packed/padded representations |
| Open3D | (V, 3) | (F, 3) | N/A | Vertex normals: (V, 3) Vertex colors: (V, 3) Triangle normals: (F, 3) |
Primarily for single mesh operations |
| Trimesh | (V, 3) | (F, 3) | N/A | Vertex normals: (V, 3) Face normals: (F, 3) UV coords: (V, 2) |
Focuses on watertight meshes |
| Kaolin | (V, 3) | (F, 3) | (B, V, 3) | Face uvs: (F, 3, 2) Vertex normals: (V, 3) Face normals: (F, 3) |
NVIDIA's 3D DL library |
| Framework | Common Shape | Common Batch Shape | Notes |
|---|---|---|---|
| PyTorch | (C, Fr, H, W), | (B, C, Fr, H, W), | Flexible ordering, commonly channel-first for consistency with image processing. |
| (Fr, C, H, W) | (B, Fr, C, H, W) | Some models/datasets use frame-first convention | |
| TensorFlow | (Fr, H, W, C), | (B, Fr, H, W, C), | Flexible ordering, commonly channel-last for consistency with image processing. |
| (H, W, C, Fr) | (B, H, W, C, Fr) | Some models use different ordering for specific architectures | |
| Torchvision | (C, Fr, H, W) | (B, C, Fr, H, W) | Typically follows PyTorch's channel-first convention, but transformations can modify this |
| OpenCV | (H, W, C) | N/A | Returns individual frames in BGR (C=3). VideoCapture reads frame by frame |
| Framework | Points Shape | Batch Shape | Notes |
|---|---|---|---|
| PyTorch3D | (N, 3) | (B, N, 3) | N = number of points |
| Open3D | (N, 3) | N/A | xyz coordinates |
| NumPy | (N, 3) | (B, N, 3) | Basic representation |
| Framework | Text Shape | Batch Shape | Notes |
|---|---|---|---|
| PyTorch | (T,) | (B, T) | T = sequence length |
| TensorFlow | (T,) | (B, T) | - |
| HuggingFace | (T,) | (B, T) | Often includes attention masks |
| SpaCy | (T,) | N/A | Document objects |