使用solr4和Haystack查找重复对象

时间:2016-04-13 11:02:00

标签: solr django-haystack

我使用solr的facet模式来查找重复项。它工作得很好,但我无法弄清楚如何获取对象id。

>>> from haystack.query import SearchQuerySet
>>> sqs = SearchQuerySet().facet('text_string', limit=-1)
>>> sqs.facet_counts()
{
    'dates': {},
    'fields': {
        'text_string': [
            ('the red ballon', 4),
            ('my grand pa is an alien', 2),
            ('be kind rewind', 12),
        ],
    },
    'queries': {}
}

我怎样才能获得我的对象的id'红色气球','我的爷爷是外星人'等等,我是否必须在solr的schema.xml中添加id字段?

我期待这样的事情:

>>> sqs.facet_counts()
{
    'dates': {},
    'fields': {
        'text_string': [
            (object_id, 'the red ballon', 4),
            (object_id, 'my grand pa is an alien', 2),
            (object_id, 'be kind rewind', 12),
        ],
    },
    'queries': {}
}

编辑:添加了schema.xml和search_indexes.py

solr的schema.xml

...
  <fields>
    <!-- general -->
    <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
    <field name="django_ct" type="string" indexed="true" stored="true" multiValued="false"/>
    <field name="django_id" type="string" indexed="true" stored="true" multiValued="false"/>
    <field name="_version_" type="long" indexed="true" stored ="true"/>
    <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
    <dynamicField name="*_s"  type="string"  indexed="true"  stored="true"/>
    <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
    <dynamicField name="*_t"  type="text_en"    indexed="true"  stored="true"/>
    <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
    <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
    <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>
    <dynamicField name="*_dt" type="date" indexed="true" stored="true"/>
    <dynamicField name="*_p" type="location" indexed="true" stored="true"/>
    <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>

    <field name="text" type="text_en" indexed="true" stored="true" multiValued="false"  termVectors="true" />
    <field name="title" type="text_en" indexed="true" stored="true" multiValued="false"  />

    <!-- Used for duplicate content detection --> 
    <copyField source="title" dest="text_string" />
    <field name="text_string" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="pk" type="long" indexed="true" stored="true" multiValued="false" />

  </fields>

  <!-- field to use to determine and enforce document uniqueness. -->
  <uniqueKey>id</uniqueKey>

  <!-- field for the QueryParser to use when an explicit fieldname is absent -->
  <defaultSearchField>text</defaultSearchField>

  <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
  <solrQueryParser defaultOperator="AND"/>
...

searche_indexes.py

class VideoIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    pk = indexes.IntegerField(model_attr='pk')
    title = indexes.CharField(model_attr='title', boost=1.125)

    def index_queryset(self, using=None):
        return Video.on_site.all()

    def get_model(self):
            return Video

1 个答案:

答案 0 :(得分:0)

Faceting是将搜索结果排列成类别(基于索引术语)。在每个类别中,Solr报告相关术语的命中数,称为方面约束。通过分面,用户可以轻松浏览电影网站和产品评论网站等网站上的搜索结果,其中类别和类别中有许多项目。

这是一个很好的例子......

faceting example by Yonik

faceting example on solr wiki

在您的情况下,您可能需要再次触发查询以获取ID和其他详细信息....