This article will explain how to use advanced search. Advanced search is available in all four corpora on this site: CNG, CMG, CGS and CGL.
There is one major difference between simple search and advanced search. In simple search, you are allowed to type any Irish word in the search box, and the search engine will try to understand what you want and give an answer. Advanced search is not like that. In advanced search, you have to write the search query in a special notation called CQL (Corpus Query Language).
BACKGROUND
CQL was conceived in the 1990s at the University of Stuttgart. Since then, different versions of CQL have been implemented in different software. The version of CQL used on this site is the same as CQL in Sketch Engine.
CQL is a notation for writing patterns. The purpose is to find strings of words that match that pattern.
EXAMPLE PATTERN
Explanation: you are looking for a sequence of two words:
- The first word is a word with the lemma
cuir
.- The second word is a word with the grammatical tag
Ncfpc
(that is, it is a feminine plural common noun in the nominative case).
In CQL, each pair of square brackets [
...]
corresponds to one word. Inside each pair of square brackets, you describe the criteria that word must meet, based on the word's attributes: the lemma, the grammar tag, and so on.
These are the attributes that can be used inside the square brackets.
word
= the form of the word
Example: [word="bhéaloidis"]
lemma
= the lemma of the word
Example: [lemma="béaloideas"]
tag
= the word's grammar tag
Example: [tag="Ncmsg"]
Those attributes are case sensitive. To search case-insensitively, these additional attributes are available:
lc
= word
in lower-case letterslemma_lc
= lemma
in lower-case lettersEXAMPLE
[word="baile"]
This will findbaile
but notBaile
.
[word="Baile"]
This will findBaile
but notbaile
.
[lc="baile"]
This will find bothbaile
andBaile
.
Special symbols called regular expressions can be used inside the quotation marks. These symbols will help you find words whose attributes correspond to certain patterns.
One useful symbol is the dot .
which stands for any character.
EXAMPLE
If you search for
[lemma="ma."]
, you will find words with three characters in theirleamma
attribute:
- at first,
m
- after that,
a
- at at the end, any character
Results:
mac
,mag
,mar
and others
Another useful symbol is the question mark ?
which indicates that the preceding character is optional.
EXAMPLE
If you search for
[word="ch?eist"]
, you will find words with this in theirword
attribute:
- at first,
c
- after that, optionally,
h
- at at the end,
eist
Results:
ceist
andcheist
Yet another symbol that is also useful is the asterisk *
which stands for any number of the preceding character.
EXAMPLE
If you search for
[word="geal*ta"]
, you will find words with this in theirword
attribute:
- at first,
gea
- after that, any number (zero included) of
l
- at at the end,
ta
Results:
geata
,gealta
,geallta
These two symbols, the full stop and the asterisk, can be combined .*
, which represents any number of any character.
EXAMPLE
If you search for
[lemma="ceist.*"]
, you will find words with this in theirlemma
attribute:
- at first,
ceist
- at at the end, any number (zero included) of any character
Results:
ceist
,ceistneoir
,ceistiúchán
and others
This is just a taste of what can be done with regular expressions. To learn more, read the article Searching with regular expressions.
You can use !=
instead of =
to find words that do not have a particular attribute value.
EXAMPLE
[lemma="cuir"] [lemma!="ceist"]
Explanation: you are looking for a sequence of two words:
- The first word is a word belonging to the lemma
cuir
.- The second word is a word belonging to any lemma other than
ceist
.
Inside each pair of square brackets, you can give more than one criterion, joined by the conjunctions &
and |
.
The conjunction&
means “and”: the preceding criterion and the following criterion must both be met.
EXAMPLE
Explanation: you are looking for a word with the lemma
inis
and a tag beginning withV
(that is, a verb).
The conjunction |
means “or”: either the preceding criterion or the following criterion (or both) must be met.
EXAMPLE
Explanation: you are looking for a word with the lemma
fear
or the lemmabean
.
Complex criteria can be formulated by bringing curly brackets in.
EXAMPLE
[(lemma="inis" | lemma="oileán") & tag="N.*"]
Explanation: you are looking for a word thay has the lemma
inis
or theoileán
aige, and hs a tag beginning withN
(that is, a noun).
Normally, each pair of square brackets [
...]
corresponds to one word in the search result. That can be changed with curly brackets {
..}
.
EXAMPLE
Explanation: you are looking for a sequence of two words whose tag starts with
V
– that is, two verbs in a row.This is the same as
[tag="V.*"] [tag="V.*"]
EXAMPLE
Explanation: you are looking for a sequence of words that contain two, three or four words in a row ending with
ach
.The expression
{2,4}
can be read as "at least two, at most four".
This is especially handy in combination with []
which means "any word":
EXAMPLE
[lemma="cuir"] []{0,3} [lemma="ceist"]
Explanation: you are looking for a sequence of words containing:
- a word belonging to the lemma
cuir
- then up to three other words
- and then a word belonging to the lemma
ceist
It sometimes happens that you get a result for your search that spans two sentences.
EXAMPLE
[lemma="ceist"] []{0,10} [lemma="freagra"]
Some results you will get from this query will extend over two sentences: the first word in one sentence and the last word in the next sentence.
If you are looking at the search results in the Concordance layout, you will see that sentence boundaries are marked with the symbols <s>
(= beginning of a sentence) and </s>
(= end of a sentence).
To constrain the search to results inside a single sentence, add the clause within <s/>
to the end of your query. This clause indicates that the entire pattern must be matched within a single block marked with <s>
and </s>
.
EXAMPLE
It was said at the beginning of this article that there is one difference between simple search and advanced search: in simple search, you can input any Irish word, and the search engine will try to understand what you are looking for.
How does the search engine do that? Behind the scenes, all simple searches are converted to CQL and, behind the scenes, the search is performed in CQL.
EXAMPLE
If you are in simple search and if you do a broad search for a single word, for example
cruin
, your query will be translated into this CQL query:
[lc="cruinn"|lemma_lc="cruinn"]
That is, a (case-insensitive) search will be made for all words whose form or lemma equals
cruinn
.
EXAMPLE
If you are doing a narrow search for
cruinn
, here is the CQL that will be created behind the scenes:That is, a (case-insensitive) search will be made for all words whose form (regardless of lemma) is equal to
cruinn
.
If you are on a search results page in simple search, you can see the CQL query behind your search at any time by clicking on the Advanced search link beside the search box. You will be taken to advanced search where the CQL query is visible.