Download Data
Download the complete Mahua Word Embeddings dataset. Files are organized by type and year. All data is provided in JSON, CSV, and TXT formats.
Complete Dataset
Download everything in a single archive:
| Description | Size | Format |
|---|---|---|
| Complete Dataset (all years, all models) | ~20 MB | .zip |
Model Data by Year
Word embeddings (Word2Vec, FastText, BERT) for each year of the corpus.
Similar structure to 1955. Each year contains:
- *_model_data_word2vec.json
- *_model_data_fasttext.json
- *_model_data_bert.json
- *_model_data_word2vec_fasttext_bert.json
Rationality Analysis Data
Similarity networks for 7 rationality-related concepts (1959_04 jf78).
Each concept has 6 methods: cosine, euclidean, manhattan, jaccard, pearson, spearman
Formats: CSV (tabular), JSON (network plot), HTML (interactive visualization)
Embedding Visualizations
Dimensionality reduction plots (2D/3D) using PCA, t-SNE, and UMAP.
Corpus Files
Original text files organized by year.
GitHub Repository
The complete dataset is available on GitHub. You can clone the repository or download specific files: