disambiguate ↗

Disambiguate is a tool for training and using state of the art neural WSD models

Open this visualization on its own page →

Contributors

Lines of Code

227

From

2018-11-02

2020-04-27

About getalp/disambiguate

Disambiguate is a Python toolkit for training and evaluating neural word sense disambiguation (WSD) models. The project implements state-of-the-art neural approaches to the task of assigning correct meanings to ambiguous words in context, based on research published by Vial, Lecouteux, and Schwab. The toolkit supports training new models from scratch using annotated corpora in the UFSAC format and provides pre-trained models that can be downloaded and used immediately for disambiguating raw text.

The toolkit offers flexible configuration for model architecture and input representations, supporting multiple embedding approaches including ELMo, BERT, and other transformer-based language models from HuggingFace. Users can choose between LSTM and transformer encoder architectures and customize numerous hyperparameters like batch size, learning rate, and layer dimensions. The system introduces sense vocabulary compression techniques that reduce the number of word senses by grouping related senses through WordNet semantic relationships, which improves both training efficiency and model performance.

The project is primarily aimed at NLP researchers and practitioners working on word sense disambiguation. Beyond the toolkit itself, it provides pre-trained models trained on SemCor and the WordNet Gloss Corpus with BERT embeddings, along with comprehensive evaluation capabilities on standard WSD benchmarks. The implementation requires Java and Maven alongside Python dependencies, and it integrates with the UFSAC corpus format for standardized data handling and evaluation against multiple SemEval and SensEval test sets.

Share this video