Info Guide for power users

Introducing CQL

This article will explain how to use advanced search. Advanced search is available in all four corpora on this site: CNG, CMG, CGS and CGL.

There is one major difference between simple search and advanced search. In simple search, you are allowed to type any Irish word in the search box, and the search engine will try to understand what you want and give an answer. Advanced search is not like that. In advanced search, you have to write the search query in a special notation called CQL (Corpus Query Language).

BACKGROUND

CQL was conceived in the 1990s at the University of Stuttgart. Since then, different versions of CQL have been implemented in different software. The version of CQL used on this site is the same as CQL in Sketch Engine.

First look at CQL

CQL is a notation for writing patterns. The purpose is to find strings of words that match that pattern.

EXAMPLE PATTERN

[lemma="cuir"] [tag="Ncfpc"]

Explanation: you are looking for a sequence of two words:

  1. The first word is a word with the lemma cuir.
  2. The second word is a word with the grammatical tag Ncfpc (that is, it is a feminine plural common noun in the nominative case).

In CQL, each pair of square brackets [...] corresponds to one word. Inside each pair of square brackets, you describe the criteria that word must meet, based on the word's attributes: the lemma, the grammar tag, and so on.

Word attributes

These are the attributes that can be used inside the square brackets.

Those attributes are case sensitive. To search case-insensitively, these additional attributes are available:

EXAMPLE

Regular expressions in CQL

Special symbols called regular expressions can be used inside the quotation marks. These symbols will help you find words whose attributes correspond to certain patterns.

One useful symbol is the dot . which stands for any character.

EXAMPLE

If you search for [lemma="ma."], you will find words with three characters in their leamma attribute:

  1. at first, m
  2. after that, a
  3. at at the end, any character

Results: mac, mag, mar and others

Another useful symbol is the question mark ? which indicates that the preceding character is optional.

EXAMPLE

If you search for [word="ch?eist"], you will find words with this in their word attribute:

  1. at first, c
  2. after that, optionally, h
  3. at at the end, eist

Results: ceist and cheist

Yet another symbol that is also useful is the asterisk * which stands for any number of the preceding character.

EXAMPLE

If you search for [word="geal*ta"], you will find words with this in their word attribute:

  1. at first, gea
  2. after that, any number (zero included) of l
  3. at at the end, ta

Results: geata, gealta, geallta

These two symbols, the full stop and the asterisk, can be combined .*, which represents any number of any character.

EXAMPLE

If you search for [lemma="ceist.*"], you will find words with this in their lemma attribute:

  1. at first, ceist
  2. at at the end, any number (zero included) of any character

Results: ceist, ceistneoir, ceistiúchán and others

This is just a taste of what can be done with regular expressions. To learn more, read the article Searching with regular expressions.

Negative matching

You can use != instead of = to find words that do not have a particular attribute value.

EXAMPLE

[lemma="cuir"] [lemma!="ceist"]

Explanation: you are looking for a sequence of two words:

  1. The first word is a word belonging to the lemma cuir.
  2. The second word is a word belonging to any lemma other than ceist.

Combinations of criteria

Inside each pair of square brackets, you can give more than one criterion, joined by the conjunctions & and |.

The conjunction& means “and”: the preceding criterion and the following criterion must both be met.

EXAMPLE

[lemma="inis" & tag="V.*"]

Explanation: you are looking for a word with the lemma inis and a tag beginning with V (that is, a verb).

The conjunction | means “or”: either the preceding criterion or the following criterion (or both) must be met.

EXAMPLE

[lemma="fear" | lemma="bean"]

Explanation: you are looking for a word with the lemma fear or the lemma bean.

Complex criteria can be formulated by bringing curly brackets in.

EXAMPLE

[(lemma="inis" | lemma="oileán") & tag="N.*"]

Explanation: you are looking for a word thay has the lemma inis or the oileán aige, and hs a tag beginning with N (that is, a noun).

Number of words

Normally, each pair of square brackets [...] corresponds to one word in the search result. That can be changed with curly brackets {..}.

EXAMPLE

[tag="V.*"]{2}

Explanation: you are looking for a sequence of two words whose tag starts with V – that is, two verbs in a row.

This is the same as [tag="V.*"] [tag="V.*"]

EXAMPLE

[word=".*ach"]{2,4}

Explanation: you are looking for a sequence of words that contain two, three or four words in a row ending with ach.

The expression {2,4} can be read as "at least two, at most four".

This is especially handy in combination with [] which means "any word":

EXAMPLE

[lemma="cuir"] []{0,3} [lemma="ceist"]

Explanation: you are looking for a sequence of words containing:

  1. a word belonging to the lemma cuir
  2. then up to three other words
  3. and then a word belonging to the lemma ceist

Constraining the search to one sentence

It sometimes happens that you get a result for your search that spans two sentences.

EXAMPLE

[lemma="ceist"] []{0,10} [lemma="freagra"]

Some results you will get from this query will extend over two sentences: the first word in one sentence and the last word in the next sentence.

If you are looking at the search results in the Concordance layout, you will see that sentence boundaries are marked with the symbols <s> (= beginning of a sentence) and </s> (= end of a sentence).

To constrain the search to results inside a single sentence, add the clause within <s/> to the end of your query. This clause indicates that the entire pattern must be matched within a single block marked with <s> and </s>.

EXAMPLE

[lemma="ceist"] []{0,10} [lemma="freagra"] within <s/>

Going from simple search to advanced search and back

It was said at the beginning of this article that there is one difference between simple search and advanced search: in simple search, you can input any Irish word, and the search engine will try to understand what you are looking for.

How does the search engine do that? Behind the scenes, all simple searches are converted to CQL and, behind the scenes, the search is performed in CQL.

EXAMPLE

If you are in simple search and if you do a broad search for a single word, for example cruin, your query will be translated into this CQL query:

[lc="cruinn"|lemma_lc="cruinn"]

That is, a (case-insensitive) search will be made for all words whose form or lemma equals cruinn.

EXAMPLE

If you are doing a narrow search for cruinn, here is the CQL that will be created behind the scenes:

[lc="cruinn"]

That is, a (case-insensitive) search will be made for all words whose form (regardless of lemma) is equal to cruinn.

If you are on a search results page in simple search, you can see the CQL query behind your search at any time by clicking on the Advanced search link beside the search box. You will be taken to advanced search where the CQL query is visible.