Question

问题

我使用myisam_ftdump的结果生成搜索建议表。这个过程进展顺利，但索引中多次出现多个单词。显然，我可以SELECT distinct term FROM suggestions ORDER BY weight，但这不会因为不止一次出现而惩罚单词吗？

如果是，是否有合并行的简明公式？

如果没有，我应该保留哪些行（例如，最高加权，最低加权）？

示例数据

+-----+------------+----------+
| id  | word       | weight   |
+-----+------------+----------+
| 670 | young      | 0.416022 |
| 669 | york       |  0.54944 |
| 668 | years      | 0.281683 |
| 667 | years      | 0.416022 |
| 666 | wrote      | 0.416022 |
| 665 | written    |  0.35841 |
| 664 | writing    |  0.29518 |
| 663 | wright     | 0.281683 |
| 662 | witness    | 0.281683 |
| 661 | wiesenthal | 0.452452 |
| 660 | white      |  0.35841 |
| 659 | white      | 0.281683 |
| 658 | wgbh       | 0.369332 |
| 657 | weighs     |  0.35841 |
+-----+------------+----------+

特别参见'白色'和'年'。

Answer 1

看起来你跑了myisam_ftdump -d。我想你想改用myisam_ftdump -c。

这将为每个单词提供一行，以及该单词在索引中出现的次数及其全局权重。

这是关于-c与-d：

的文档

  -c, --count         Calculate per-word stats (counts and global weights).
  -d, --dump          Dump index (incl. data offsets and word weights).

我应该如何在MyISAM搜索索引中处理重复条目的权重？

1 个答案: