Chris Pollett > Students >

    ( Print View)



    [CS 297 Proposal]

    [Comparison between hash2vec and word2vec -pdf]

    [Different Approaches for word2vec from reference paper -pdf]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [Deliverable 5]


    [CS 298 Proposal]



In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is text. From this, a bag of words (BOW) model can be constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature of each of the documents in both the training and test sets. Hashing function can be defined here as [from wikipedia] : function hashing_vectorizer(features : array of string, N : integer):
x := new vector[N]
for f in features:
h := hash(f)
x[h mod N] += 1
return x