Deliverable 2: Autoencoder (AE) to generate Devanagari characters.
Data Collection:
- Dataset is available at: https://www.kaggle.com/rishianand/devanagari-character-set
- It comprises 92000 images [32x32 px] corresponding to 46 characters, consonants "ka" to "gya", and the digits 0 to 9.
- 33 original Devanagari alphabets + 3 Nepali alphabets + 10 digits = 46 characters
- The vowels are missing.
Autoencoder models
Basic Autoencoders (Using simple ANN and CNN) and Variational Autoencoders (VAEs) are implemented. Also, we generated random unit
gaussian vectors to extract corresponding vectors from the latent space of Basic AE to examine how Basic AE can perform to generate
new Devanagari characters. The same experiment was performed to extract distributions from the latent space of VAE and results are analyzed.
PyTorch implementation can be found at: devnagari-classification.zip
Experiment specific images can be found in the Notes section below.
Sr# | Model description | Notes |
Model 1 |
AutoEncoder(
(encoder): Encoder(
(encoder): Sequential(
(0): Linear(in_features=1024, out_features=921, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=921, out_features=819, bias=True)
(3): ReLU(inplace=True)
(4): Linear(in_features=819, out_features=614, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=614, out_features=409, bias=True)
(7): ReLU(inplace=True)
(8): Linear(in_features=409, out_features=307, bias=True)
(9): ReLU(inplace=True)
)
)
(decoder): Decoder(
(decoder): Sequential(
(0): Linear(in_features=307, out_features=409, bias=True)
(1): ReLU(inplace=True)
(2): Linear(in_features=409, out_features=614, bias=True)
(3): ReLU(inplace=True)
(4): Linear(in_features=614, out_features=819, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=819, out_features=921, bias=True)
(7): ReLU(inplace=True)
(8): Linear(in_features=921, out_features=1024, bias=True)
(9): Tanh()
)
)
)
|
Code:
basic_ae_devnagari_1.ipynb
Sample Images:
basic_ae_devnagari_1_devnagari_decoded.zip
|
Model 2 |
autoencoder(
(encoder): Sequential(
(0): Conv2d(1, 16, kernel_size=(3, 3), stride=(2, 2), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(16, 8, kernel_size=(3, 3), stride=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
)
(decoder): Sequential(
(0): ConvTranspose2d(8, 18, kernel_size=(3, 3), stride=(2, 2))
(1): ReLU(inplace=True)
(2): ConvTranspose2d(18, 8, kernel_size=(3, 3), stride=(2, 2))
(3): ReLU(inplace=True)
(4): ConvTranspose2d(8, 1, kernel_size=(2, 2), stride=(3, 3))
(5): Tanh()
)
)
|
Code:
basic_cnn_autoencoder_devnagari_4.ipynb
Sample Images:
basic_cnn_devnagari_autoencoder_4_decoded.zip
|
Model 3 |
VAE(
(encoder): VAEEncoder(
(fc1): Linear(in_features=1024, out_features=819, bias=True)
(fc2): Linear(in_features=819, out_features=512, bias=True)
(mu): Linear(in_features=512, out_features=307, bias=True)
(var): Linear(in_features=512, out_features=307, bias=True)
)
(decoder): VAEDecoder(
(fc1): Linear(in_features=307, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=819, bias=True)
(out): Linear(in_features=819, out_features=1024, bias=True)
)
)
|
Code: vae_devnagari_1.ipynb
Sample images:
vae_devnagari_1_decoded.zip
|
Conclusion
Devanagari characters generated using VAEs are more realistic. Basic AE seems to generate random noise, as its latent spaces can
be traced to sample meaningful data.
|