Deliverable 2: Autoencoder (AE) to generate Devanagari characters.

Data Collection:

  • Dataset is available at: https://www.kaggle.com/rishianand/devanagari-character-set
  • It comprises 92000 images [32x32 px] corresponding to 46 characters, consonants "ka" to "gya", and the digits 0 to 9.
  • 33 original Devanagari alphabets + 3 Nepali alphabets + 10 digits = 46 characters
  • The vowels are missing.

Autoencoder models

Basic Autoencoders (Using simple ANN and CNN) and Variational Autoencoders (VAEs) are implemented. Also, we generated random unit
gaussian vectors to extract corresponding vectors from the latent space of Basic AE to examine how Basic AE can perform to generate
new Devanagari characters. The same experiment was performed to extract distributions from the latent space of VAE and results are analyzed.
PyTorch implementation can be found at:
devnagari-classification.zip Experiment specific images can be found in the Notes section below.

Sr#Model description Notes
Model 1
  
AutoEncoder(
  (encoder): Encoder(
    (encoder): Sequential(
      (0): Linear(in_features=1024, out_features=921, bias=True)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=921, out_features=819, bias=True)
      (3): ReLU(inplace=True)
      (4): Linear(in_features=819, out_features=614, bias=True)
      (5): ReLU(inplace=True)
      (6): Linear(in_features=614, out_features=409, bias=True)
      (7): ReLU(inplace=True)
      (8): Linear(in_features=409, out_features=307, bias=True)
      (9): ReLU(inplace=True)
    )
  )
  (decoder): Decoder(
    (decoder): Sequential(
      (0): Linear(in_features=307, out_features=409, bias=True)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=409, out_features=614, bias=True)
      (3): ReLU(inplace=True)
      (4): Linear(in_features=614, out_features=819, bias=True)
      (5): ReLU(inplace=True)
      (6): Linear(in_features=819, out_features=921, bias=True)
      (7): ReLU(inplace=True)
      (8): Linear(in_features=921, out_features=1024, bias=True)
      (9): Tanh()
    )
  )
) 
Code:
basic_ae_devnagari_1.ipynb
Sample Images:
basic_ae_devnagari_1_devnagari_decoded.zip
Model 2
    

autoencoder(
  (encoder): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(2, 2), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(16, 8, kernel_size=(3, 3), stride=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  )
  (decoder): Sequential(
    (0): ConvTranspose2d(8, 18, kernel_size=(3, 3), stride=(2, 2))
    (1): ReLU(inplace=True)
    (2): ConvTranspose2d(18, 8, kernel_size=(3, 3), stride=(2, 2))
    (3): ReLU(inplace=True)
    (4): ConvTranspose2d(8, 1, kernel_size=(2, 2), stride=(3, 3))
    (5): Tanh()
  )
)
Code:
basic_cnn_autoencoder_devnagari_4.ipynb
Sample Images:
basic_cnn_devnagari_autoencoder_4_decoded.zip
Model 3
  VAE(
  (encoder): VAEEncoder(
    (fc1): Linear(in_features=1024, out_features=819, bias=True)
    (fc2): Linear(in_features=819, out_features=512, bias=True)
    (mu): Linear(in_features=512, out_features=307, bias=True)
    (var): Linear(in_features=512, out_features=307, bias=True)
  )
  (decoder): VAEDecoder(
    (fc1): Linear(in_features=307, out_features=512, bias=True)
    (fc2): Linear(in_features=512, out_features=819, bias=True)
    (out): Linear(in_features=819, out_features=1024, bias=True)
  )
)

Code:
vae_devnagari_1.ipynb
Sample images:
vae_devnagari_1_decoded.zip

Conclusion

Devanagari characters generated using VAEs are more realistic. Basic AE seems to generate random noise, as its latent spaces can
be traced to sample meaningful data.