Solr建议:分布式搜索(solrcloud)重复结果

时间:2015-01-05 19:15:30

标签: solr

我有两个分片,我正在尝试使用分片上的分布式搜索来实现建议器(使用solr 4.10.1)。似乎建议者遍历每个分片并加入结果集,留下重复。在我的solrconfig.xml文件中,我有以下内容:

<searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">titleSuggester</str>
      <str name="lookupimpl">AnalyzingLookupFactory</str>
      <str name="lookupimpl">FreeTextSuggesterFactory</str>
      <str name="dictionaryimpl">DocumentDictionaryFactory</str>
      <str name="field">title_sug</str>
      <str name="weightField">rank</str>
      <str name="suggestAnalyzerFieldType">shingleSuggest</str>
      <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>`


<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

http://localhost:8983/solr/collection1/suggest?suggest.dictionary=titleSuggester&shards.qt=/suggest&shards=shard1,shard2&suggest.q=an&wt=json&indent=true导致:

{   "responseHeader":{
    "status":0,
    "QTime":12},   "suggest":{"titleSuggester":{
      "an":{
        "numFound":10,
        "suggestions":[{
            "term":"an",
            "weight":149,
            "payload":""},
          {
            "term":"an",
            "weight":142,
            "payload":""},
          {
            "term":"an american",
            "weight":6,
            "payload":""},
          {
            "term":"an affair",
            "weight":4,
            "payload":""},
          {
            "term":"an 18th century",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th",
            "weight":2,
            "payload":""},
          {
            "term":"an american hymn",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing room",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing",
            "weight":2,
            "payload":""},
          {
            "term":"an american hymn (main",
            "weight":2,
            "payload":""}]}}}}

如上所示,结果术语“an”返回两次,每个碎片一次。如果我使用distrib = false执行相同的查询( http://localhost:8983/solr/collection1/suggest?suggest.dictionary=titleSuggester&distrib=false&suggest.q=an&wt=json&indent=true),我没有像预期的那样重复:

{ "responseHeader":{
    "status":0,
    "QTime":1},
  "suggest":{"titleSuggester":{
      "an":{
        "numFound":10,
        "suggestions":[{
            "term":"an",
            "weight":149,
            "payload":""},
          {
            "term":"an 18th",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing",
            "weight":2,
            "payload":""},
          {
            "term":"an 18th century drawing room",
            "weight":2,
            "payload":""},
          {
            "term":"an absolution take",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her to",
            "weight":1,
            "payload":""},
          {
            "term":"an absolution take her to sea,",
            "weight":1,
            "payload":""},
          {
            "term":"an affair",
            "weight":4,
            "payload":""}]}}}}

有没有办法删除重复的结果?

1 个答案:

答案 0 :(得分:0)

您可以使用Solr的群组功能;添加到您的查询:

&安培;基团=真安培; group.field =术语安培; group.main =真

每个相同的术语只返回一个文档,并以与常规查询相同的格式返回它们(group.main = true)。

有关详细信息,请参阅http://wiki.apache.org/solr/FieldCollapsing