|
|
# Findings with the Elastic Search REST API
|
|
|
|
|
|
|
|
|
## Different search query methods
|
|
|
|
|
|
**(Multi-)match** vs. **(simple_)query_string**
|
|
|
- **Match**: Returns documents that match a provided text, number, date or boolean value
|
|
|
- **query_string**: Returns documents based on a provided query string, using a parser with a strict syntax. (=> simple_query_string more dynamic than match and easier and more robiust syntax than query_string)
|
|
|
|
|
|
## Improving the search
|
|
|
|
|
|
Using simple_query_string in the following examples.
|
|
|
|
|
|
### Basic title search
|
|
|
|
|
|
Let's start with a simple example (GET: https://localhost:9200/_search):
|
|
|
```json
|
|
|
{
|
|
|
"query": {
|
|
|
"simple_query_string" : {
|
|
|
"query": "Python programming",
|
|
|
"fields": ["_content.Page.Content.Title.~"]
|
|
|
}
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Searches for the two words *Python* and *programming* in the field *title*.
|
|
|
|
|
|
The returned results contain 5 documents:
|
|
|
1. "Programming for linguistics" (Score: 4.294303)
|
|
|
2. "Programming for linguistics" (Score: 4.294303)
|
|
|
3. "Python basics" (Score: 4.268799)
|
|
|
4. "Stanford NER from Python" (Score: 3.556681)
|
|
|
5. "Stanford PoS Tagger: tagging from Python" (Score: 3.0481849)
|
|
|
|
|
|
The problem is that this input either searches for the string *python* OR the string *programming* in the title.
|
|
|
|
|
|
### AND & OR operators
|
|
|
|
|
|
To search for results containing both words the *default_operator* parameter (default: "or") needs to be set to "and":
|
|
|
```json
|
|
|
{
|
|
|
"query": {
|
|
|
"simple_query_string" : {
|
|
|
"query": "Python programming",
|
|
|
"fields": ["_content.Page.Content.Title.~"],
|
|
|
"default_operator": "and"
|
|
|
}
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Now no result is returned, since no title of the documents contains both words.
|
|
|
|
|
|
AND and OR can also be specified in the query string with & and |. Other operations can be used with the [simple query string syntax](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#simple-query-string-syntax).
|
|
|
|
|
|
### Search multiple fields
|
|
|
|
|
|
We can search more fields by adding the path to the *fields* parameter:
|
|
|
```json
|
|
|
"fields": ["_content.Page.Content.Title.~",
|
|
|
"_content.Page.Content.Body.~"]
|
|
|
```
|
|
|
Now the returned search results contain 11 Documents.
|
|
|
|
|
|
### Use Fuzziness for typos
|
|
|
|
|
|
Since the searches only look for exact matches of strings and keywords the following search does not return any results:
|
|
|
```json
|
|
|
{
|
|
|
"query": {
|
|
|
"simple_query_string" : {
|
|
|
"query": "Pytho",
|
|
|
"fields": [
|
|
|
"_content.Page.Content.Title.~",
|
|
|
"_content.Page.Content.Body.~"
|
|
|
],
|
|
|
"default_operator": "and"
|
|
|
}
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
To solve this issue we add automatic fuzziness to the query.
|
|
|
```json
|
|
|
"query": "Pytho~"
|
|
|
```
|
|
|
Now we retrieve 28 different documents.
|
|
|
The Edit-Distances for fuzziness can be specified behind the like `"query": "Pytho~2"` but the automatic fuzziness setting is [recommended](https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness) as it takes term length into account.
|
|
|
|
|
|
Test |