After you've searched for something, you will come to a search results page. There are two versions of this page available: Sentences and Concordance. You can swap between the two versions and they are identical in terms of content: the only difference between them is the layout of the data on the screen.
In the rest of this article we will take a look at what the search results pages offers.
In the Monitor Corpus of Irish, a timeline can be seen at the top of the page which shows the distribution of the results from year to year.
You can click on a year to narrow the search results down to that year. To remove the filter simply click on the same year again.
The timeline is only available in the Monitor Corpus of Irish because the goal of that corpus is to track how the language develops from year to year. It is not available in the other corpora.
If the search has returned more than ten results, a checkbox titled Concise sentences first will be available at the top of the search results list. If that box is ticked, sentences that the search algorithm thinks are "concise" will be promoted to the top of the list. Concise sentences are sentences that are relatively short, contain only simple ordinary words, do not contain numbers or unusual punctuation, and so on. The algorithm scores the sentences according to these criteria and puts the highest-scoried ones at the top of the list.
If you untick that box, you will get the search results in an arbitrary order, regardless of conciseness.
Another way to tackle a long list of results is the Random sample checkbox. If you tick that box, you will get 40 sentences randomly selected from all the results. You can click the Reload button next to it to get another batch of 40.
If you have searched for a word that has more than one part of speech, you will see options under the search box to narrow the search results down to a specific part of speech.
EXAMPLES
This feature is based on grammatical tagging of the words in the corpora. That tagging was done by an automatic program, so it's not always 100% correct. Unavoidably, you will sometimes see words here that are classified under the wrong part of speech.
More options are available in a panel that appears on the right-hand side of the screen (if you are in the Concordance layout or a narrow screen, then the panel needs to be opened first by clicking on it).
This panel becomes available if your search has returned more than ten results, and gives you various options to narrow the list of results down. That panel has three sections:
Word properties. This section provides a statistical overview of the words that correspond to your search: which forms they have, which lemmas they correspond to, which grammatical tags are associated with them. For more information on these things, read the article Words, lemmas and tags.
Collocations. In this section you can find information about any other words found near the words that correspond to your search: that is to say, information about the colloations of the words. More information about this can be found in the article Collocations.
Metadata. In all the corpora on this site, the texts are associated with various metadata that tell what type of text it is (newspaper article, radio programme, blog post...), what genre it belongs to (news, opinion, fiction...), whether it is a written text or a spoken text, and others. It is possible to obtain statistical information on such metadata in this section of the panel, and to filter the search results accordingly: for example, to narrow the list down to works of fiction only, or to exclude legal texts from the list.
There are three clickable buttons available next to each search result on the list.
Metadata. You can click this button to get an overview of the metadata related to the text from which this sentence was taken: author's name, title of the work, genre, publisher and others. Sometimes, if it is a text that is available in its entirety elsewhere on the internet, there is even a clickable URL available to access the full text.
Wide context. You can click this button to get a longer extract of the text: up to 75 words before and after the word that corresponds to your search. This limit (± 75 words) exists because we are not allowed, for copyright reasons, to show more than that amout from any text at a time.
Copy. This button is a quick way to copy the entire sentence to your computer's clipboard, just like Ctrl + C would.
If you are in Concordance layout, you will see that sentence boundaries are marked with the symbols <s>
and </s>
:
<s>
marks the beginning of a sentence</s>
marks the end of a sentenceThese markers are used in all the corpora on this site. They are also used in the Sentences layout: in that layout, only the sentence containing the word (or words) corresponding to your search is displayed.
Another way these markers are useful is that multi-word searches can be limited to results that are within a single sentence. Knowledge of the CQL query language is required for this: see the article Introducing CQL .
This marking of sentence boundaries was created by an automatic computer program, so it is not always 100% accurate: it is possible that the program occasionally misinterpreted punctuation, thinking it was end-of-sentence punctuation when it was not, or vice versa – this is unavoidable sometimes.
Clicking on any word in the search results will open a word information box.
This box tells you what the corpus knows about that particular word: the form of the word, the lemma of the word, and the grammatical tag of the word, plus a link to explain the meaning of the tag. To understand this information, you are advised to read the article Words, lemmas and tags.
The word information box can be closed by clicking on the word again.