Chris Pollett >
Students > [Bio] [Blog] [Comparison between hash2vec and word2vec -pdf] [Different Approaches for word2vec from reference paper -pdf] |
Description:
In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is text. From this, a bag of words (BOW) model can be constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature of each of the documents in both the training and test sets.
Hashing function can be defined here as [from wikipedia] :
function hashing_vectorizer(features : array of string, N : integer):
Deliverables: |