Yearly Model Data

Word embedding statistics for each year of the Mahua corpus. Each year includes three model types: Word2Vec, FastText, and BERT.

1959 Model Statistics

Corpus: 11 files (Feb-Dec 1959, excluding Jan)

Word2Vec
Vocabulary Size~5,800 words (estimated)
Vector Size100 dimensions
File1959_model_data_word2vec.json
FastText
Vocabulary Size~5,800 words
Vector Size100 dimensions
File1959_model_data_fasttext.json
BERT
Sentence Count~580 sentences
Embedding Size768 dimensions
File1959_model_data_bert.json

Special: Rationality Subcorpus (1959_04)

jf78 (April 1959, 1st issue) - Focused analysis on rationality concepts

Word2Vec (jf78)
Vocabulary Size1,330 words
Vector Size100 dimensions
File1959_04_1_jf78_model_data_word2vec.json
FastText (jf78)
Vocabulary Size1,330 words
Vector Size100 dimensions
File1959_04_1_jf78_model_data_fasttext.json
BERT (jf78)
Sentence Count366 sentences
Embedding Size768 dimensions
File1959_04_1_jf78_model_data_bert.json

Sample Data Preview

Sample Word2Vec vectors from 1959 data:

Download All Yearly Model Data

Get all model data files (Word2Vec, FastText, BERT) for every year in a single download.