|
## How the scores for a document is calculated |
|
## Score calculation
|
|
\ No newline at end of file |
|
|
|
|
|
Source: https://www.compose.com/articles/how-scoring-works-in-elasticsearch/
|
|
|
|
|
|
|
|
$`score(q, d) = queryNorm(q) * coord(q, d) * SUM(tf(t), idf(t)^2, t.getBoost(), norm(t, d)) (t in q)`$
|
|
|
|
|
|
|
|
$`score(q, d)`$: The relevance score of a document $`d`$ for a query $`q`$ with terms $`t`$
|
|
|
|
|
|
|
|
$`queryNorm(q)`$: The query normalization factor. Is used so that multiple different queries can be compared.
|
|
|
|
|
|
|
|
$`coord(q, d)`$: The coordination factor. If a document contains 3 query terms its ranked higher than a document that only contains 2 terms. (Turn off for synonyms)
|
|
|
|
|
|
|
|
Then the sum for each of the terms $`t`$ in $`q`$:
|
|
|
|
- Query Boost: Combined with the queryNorm in explaination. Individual queries can be positively and negatively boosted when multple queries are executed.
|
|
|
|
- Term frequency (tf): $`tf = sqrt(termFreq)`$
|
|
|
|
- Inverse Doc Frequency (idf): $`idf = 1 + ln(maxDocs/(docFreq + 1))`$
|
|
|
|
- Field length normalization (norm): $`norm = 1/sqrt(numFieldTerms)`$ Inverse square root of the number of terms in the field. Normalizes documents, so that long documents with many terms are scored like short documents with less terms.
|
|
|
|
|
|
|
|
### Field Boost
|
|
|
|
One can add a higher weight to a field , so that this field influences the overall score more than other fields.
|
|
|
|
```json
|
|
|
|
"fields": [ "_content.Page.Content.Title.~^2",
|
|
|
|
"_content.Page.Content.Body.~"]
|
|
|
|
```
|
|
|
|
|