Yearly Model Data
Word embedding statistics for each year of the Mahua corpus. Each year includes three model types: Word2Vec, FastText, and BERT.
1959 Model Statistics
Corpus: 11 files (Feb-Dec 1959, excluding Jan)
Word2Vec
| Vocabulary Size | ~5,800 words (estimated) |
| Vector Size | 100 dimensions |
| File | 1959_model_data_word2vec.json |
FastText
| Vocabulary Size | ~5,800 words |
| Vector Size | 100 dimensions |
| File | 1959_model_data_fasttext.json |
BERT
| Sentence Count | ~580 sentences |
| Embedding Size | 768 dimensions |
| File | 1959_model_data_bert.json |
Special: Rationality Subcorpus (1959_04)
jf78 (April 1959, 1st issue) - Focused analysis on rationality concepts
Word2Vec (jf78)
| Vocabulary Size | 1,330 words |
| Vector Size | 100 dimensions |
| File | 1959_04_1_jf78_model_data_word2vec.json |
FastText (jf78)
| Vocabulary Size | 1,330 words |
| Vector Size | 100 dimensions |
| File | 1959_04_1_jf78_model_data_fasttext.json |
BERT (jf78)
| Sentence Count | 366 sentences |
| Embedding Size | 768 dimensions |
| File | 1959_04_1_jf78_model_data_bert.json |
Sample Data Preview
Sample Word2Vec vectors from 1959 data:
Download All Yearly Model Data
Get all model data files (Word2Vec, FastText, BERT) for every year in a single download.