WSABIE (Weston et al., 2011) is a supervised
bilinear embedding model. Each word and tag has
an embedding. The words in a text are averaged
to produce an embedding of the text, and hashtags
are ranked by similarity to the text embedding.
That is, the model is of the form:
f(w; t) = w>U>V t
where the post w is represented as a bag of words
(a sparse vector in RN), the tag is a one-hot-vector
in RN, and U and V are k N embedding matrices.
The WARP loss, as described in section 3, is
used for training.
Performance of all these models at hashtag prediction
is summarized in Tables 3 and 4. We find
similar results for both datasets. The frequency
and #words baselines perform poorly across the board, establishing the need to learn from text.
Among the learning models, the unsupervised
word2vec performs the worst. We believe this
is due to it being unsupervised – adding supervision
better optimizes the metric we evaluate.
#TAGSPACE outperforms WSABIE at all dimensionalities.
Due to the relatively large test sets,
the results are statistically significant; for example,
comparing #TAGSPACE (64 dim) beats Wsabie (64
dim) for the page dataset 56% of the time, and
draws 23% of the time in terms of the rank metric,
and is statistically significant with a Wilcoxon
signed-rank test.
Some example predictions for #TAGSPACE are
given for some constructed examples in Table 2.
We also show nearest word embeddings to the
posts. Training data was collected at the time of
the pax winter storm, explaining predictions for
the first post, and Kevin Spacey appears in the
show “House of Cards,”. In all cases the hashtags
reveal labels that capture the semantics of the
posts, not just syntactic similarity of individual
words.