Corpas na Gaeilge Labhartha (CLG) is a corpus of the spoken language containing 9 million words. It contains two types of text – transcripts of spoken material (e.g. radio shows from Raidió na Gaeltachta, folklore interviews) and texts written to be read aloud (e.g. television scripts, songs, prayers).
It is hoped that this data will be useful for the advanced learner, the lexicographer and the linguist. CLG is an unbalanced corpus and more spoken material will be added in the future.