Extras

Word embeddings

This application demonstrates how word embeddings can be used to find similar words. Similar words here are words which occur in the same or similar contexts in the National Corpus of Irish.

Downloadable word embeddings

Licence and attribution »

How to use

This code sample shows how to load and use the word embeddings with the Python programming language together with the Gensim library.
import gensim

# load the vectors:
wv = gensim.models.KeyedVectors.load_word2vec_format('cng-fasttext.vec', binary=False, limit=100000)

# find ten words most similar to 'teach':
similars = wv.most_similar('teach', topn=10)
for similar in similars:
  print(similar)
Output:
('tigh', 0.9031928181648254)
('seanteach', 0.773318350315094)
('mbaile', 0.7576225996017456)
('tigín', 0.753011167049408)
('séipéal', 0.7515964508056641)
('teachín', 0.7445628643035889)
('pub', 0.7366455793380737)
('scioból', 0.7314869165420532)
('siopa', 0.7245514988899231)
('bhothán', 0.7238678336143494)