我有一个索引“name_and_title_index”,其中包含两个字段“name”和“title”。
Indextool为我提供了感兴趣的关键字的信息:
keyword ,docs ,hits ,offset
word7 ,56 ,57 ,519386707
word8 ,154 ,161 ,475390304
word2 ,2438 ,2597 ,14258546
word3 ,26599 ,29074 ,68018978
word5 ,475349 ,656569 ,191390685
word1 ,645079 ,881965 ,303666122
word6 ,1089457 ,1435180 ,350540391
indexed_documents - 10742342,总关键字 - 1379888
在我看来,我不理解经纪人,因为所有人都会以不同的顺序返回结果,而不是我期望的结果。
我希望word7的任何结果都会有更高的权重(10.7M中只有56个文档)
SphinxQL是:
SELECT
ID,
WEIGHT(),
SNIPPET(name, 'word1 word2 word3 word4 word5 word6') AS _name,
SNIPPET(title, 'word7 word8 word9') AS _title
FROM
name_and_title_index
WHERE
MATCH('@name "word1 word2 word3 word4 word5 word6"/0.5 @title "word7 word8 word9"/0.5')
不同的经纪人给了我下一个结果:
RANKER=PROXIMITY_BM25;
| 1 | 6546 | _ <b>word6</b> <b>word1</b> <b>word2</b> <b>word3</b> | _ _ <b>word8</b> _ _ <b>word7</b> |
| 4 | 6528 | _ _ _ _ _ _ _ _ <b>word2</b> <b>word3</b> <b>word4</b> _ | _ _ <b>word8</b> _ _ _ _ _ ... |
| 2 | 4521 | <b>word5</b> <b>word6</b> _ _ _ _ _ _ <b>word1</b> _ _ | _ <b>word7</b> _ _ _ _ _ _ _ _ ... |
| 3 | 4520 | <b>word5</b> _ <b>word1</b> _ _ _ _ _ <b>word6</b> _ _ | _ _ _ _ _ _ _ _ _ _ _ _ <b>word7</b> |
| 5 | 4519 | <b>word1</b> _ _ _ _ _ <b>word5</b> <b>word6</b> _ _ _ _ | _ _ _ _ _ _ <b>word8</b> _ _ _ _ _ _ |
| 6 | 2520 | <b>word5</b> _ _ _ _ _ ... _ _ _ _ <b>word6</b> _ _ _ _ _ ... | ... _ _ _ _ _ _ _ <b>word8</b> _ _ |
RANKER=BM25;
| 1 | 2546 | _ <b>word6</b> <b>word1</b> <b>word2</b> <b>word3</b> | _ _ <b>word8</b> _ _ <b>word7</b> |
| 4 | 2528 | _ _ _ _ _ _ _ _ <b>word2</b> <b>word3</b> <b>word4</b> _ | _ _ <b>word8</b> _ _ _ _ _ ... |
| 2 | 2521 | <b>word5</b> <b>word6</b> _ _ _ _ _ _ <b>word1</b> _ _ | _ <b>word7</b> _ _ _ _ _ _ _ _ ... |
| 3 | 2520 | <b>word5</b> _ <b>word1</b> _ _ _ _ _ <b>word6</b> _ _ | _ _ _ _ _ _ _ _ _ _ _ _ <b>word7</b> |
| 5 | 2520 | <b>word1</b> _ _ _ _ _ <b>word5</b> <b>word6</b> _ _ _ _ | _ _ _ _ _ _ <b>word8</b> _ _ _ _ _ _ |
| 6 | 2519 | <b>word5</b> _ _ _ _ _ ... _ _ _ _ <b>word6</b> _ _ _ _ _ ... | ... _ _ _ _ _ _ _ <b>word8</b> _ _ |
RANKER=SPH04;
| 4 | 16528 | _ _ _ _ _ _ _ _ <b>word2</b> <b>word3</b> <b>word4</b> _ | _ _ <b>word8</b> _ _ _ _ _ ... |
| 1 | 14546 | _ <b>word6</b> <b>word1</b> <b>word2</b> <b>word3</b> | _ _ <b>word8</b> _ _ <b>word7</b> |
| 2 | 14521 | <b>word5</b> <b>word6</b> _ _ _ _ _ _ <b>word1</b> _ _ | _ <b>word7</b> _ _ _ _ _ _ _ _ ... |
| 3 | 14520 | <b>word5</b> _ <b>word1</b> _ _ _ _ _ <b>word6</b> _ _ | _ _ _ _ _ _ _ _ _ _ _ _ <b>word7</b> |
| 5 | 14519 | <b>word1</b> _ _ _ _ _ <b>word5</b> <b>word6</b> _ _ _ _ | _ _ _ _ _ _ <b>word8</b> _ _ _ _ _ _ |
| 6 | 10520 | <b>word5</b> _ _ _ _ _ ... _ _ _ _ <b>word6</b> _ _ _ _ _ ... | ... _ _ _ _ _ _ _ <b>word8</b> _ _ |
为什么结果4总是高于结果2和3(并且SPH04高于结果1)?