Description:
A simple technique for dimensionality reduction is Feature Hashing The idea is to apply a hashing function to each feature of a high dimensional vector to determine a new dimension for the feature in a reduced space. Feature hashing has been used successfully to reduce the dimensionality of the BOW model for texts used feature hashing to classify mail as spam or ham. To mitigate the effect of hash collisions propose the use of a second hash function that determines the sign of a feature. Therefore if we apply the feature hashing to the word co-occurrence matrix we are able to obtain an embedding where the inner products between the embedded vectors accurately represent the inner products between the original vectors in the co-occurrence matrix.
Deliverables:
|