About the models & data
-
The word2Vec models given in this demo is consistted of two
Skipgram models trained by FastText.
-
First model (SkipGram_Newspaper) is trained with data taken from TS TimeLine Corpus.
A selection of news, covering the same time period is extracted from the corpus, with the same data size used for the social media model.
Data is consisted of 65k news/columns that is ~24 million tokens.
-
Second model (SkipGram_Social_Media) is trained with data takin from Kemik Natural Language Processing Group, Yıldız Technical University and can be accessed via this page.
Data is consisted of 20 milion Tweets that is +24 million tokens.