qualia_core.dataset.MNIST module

class qualia_core.dataset.MNIST.IDXType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: IntEnum

List of possible data types of an IDX file.

UINT8 = 8
INT8 = 9
INT16 = 11
INT32 = 12
FLOAT32 = 13
FLOAT64 = 14
as_numpy_dtype() dtype[Any][source]

Convert the selected enum type to a numpy.dtype object, with big-endian byte order.

Returns:

numpy.dtype for the corresponding data type

class qualia_core.dataset.MNIST.IDXMagicNumber[source]

Bases: BigEndianStructure

Magic number of IDX file format.

Header of 4 bytes. - First 2 bytes are always 0 - 3rd byte is the data type, one of IDXType - 4th byte is the number of dimensions that follow the magic number

dtype

Structure/Union member

n_dims

Structure/Union member

null

Structure/Union member

class qualia_core.dataset.MNIST.MNISTBase(path: str = '', dtype: str = 'float32')[source]

Bases: RawDataset

Base class for MNIST-style datasets (MNIST and Fashion-MNIST).

This class provides common functionality for loading and processing datasets that use the IDX file format. Both MNIST and Fashion-MNIST share the same: - File format (IDX) - Image dimensions (28x28 pixels) - Number of classes (10) - Dataset sizes (60,000 training, 10,000 test)

The IDX file format is a simple format for vectors and multidimensional matrices of various numerical types. The files are organized as: - magic number (4 bytes) identifying data type and dimensions - dimension sizes (4 bytes each) - data in row-major order

Initialize an MNIST-style dataset.

Parameters:
  • path – Directory containing the IDX files

  • dtype – Data type to convert images to

class qualia_core.dataset.MNIST.MNIST(path: str = '', dtype: str = 'float32')[source]

Bases: MNISTBase

Original MNIST handwritten digits dataset.

The MNIST database contains 70,000 grayscale images of handwritten digits (0-9). Each image is 28x28 pixels, centered to reduce preprocessing and get better results.

Dataset split: - 60,000 training images - 10,000 test images

Labels: - 0-9: Corresponding digits

Initialize an MNIST-style dataset.

Parameters:
  • path – Directory containing the IDX files

  • dtype – Data type to convert images to

class qualia_core.dataset.MNIST.FashionMNIST(path: str = '', dtype: str = 'float32')[source]

Bases: MNISTBase

Fashion MNIST clothing dataset.

A drop-in replacement for MNIST, containing 70,000 grayscale images of clothing items. Each image is 28x28 pixels, following the same format as original MNIST.

Dataset split: - 60,000 training images - 10,000 test images

Labels: 0: T-shirt/top 5: Sandal 1: Trouser 6: Shirt 2: Pullover 7: Sneaker 3: Dress 8: Bag 4: Coat 9: Ankle boot

Initialize an MNIST-style dataset.

Parameters:
  • path – Directory containing the IDX files

  • dtype – Data type to convert images to