About the models & data

    The word2Vec models given in this demo is consistted of two Skipgram models trained by FastText.
  • First model (SkipGram_Newspaper) is trained with data taken from TS TimeLine Corpus. A selection of news, covering the same time period is extracted from the corpus, with the same data size used for the social media model.

    Data is consisted of 65k news/columns that is ~24 million tokens.

  • Second model (SkipGram_Social_Media) is trained with data takin from Kemik Natural Language Processing Group, Yıldız Technical University and can be accessed via this page.

    Data is consisted of 20 milion Tweets that is +24 million tokens.