Solr - 如何按特定字段的值的频率排序?

时间:2013-07-26 13:31:06

标签: java eclipse sorting solr solrj

我在Eclipse上使用Java和SolrJ。如何通过某个字段上的值的并发来对SolrQuery的结果进行排序?例如,当我搜索特定作者的前n篇文章( docType = 0 )时,我想按 journal_facet 字段中的值的频率(类型字符串)对查询结果进行排序

如果某位作者X写了:

  • 在期刊J0
  • 中的2篇文章(a0,a1)
  • 期刊名称为J1
  • 的3篇文章(a2,a3,a4)
  • 在期刊J2
  • 中的1篇文章(a5)

订单必须是a2,a3,a4,a0,a1,a5,我想以下列方式显示结果

<doc>
 <arr name="author">
  <str>X</str>
 </arr>
 <str name="title">a2</str>
 <str name="journal">J1</str>
</doc>
<doc>
 <arr name="author">
  <str>X</str>
 </arr>
 <str name="title">a3</str>
 <str name="journal">J1</str>
</doc>
<doc>
 <arr name="author">
  <str>X</str>
 </arr>
 <str name="title">a4</str>
 <str name="journal">J1</str>
</doc>
<doc>
 <arr name="author">
  <str>X</str>
 </arr>
 <str name="title">a0</str>
 <str name="journal">J0</str>
</doc>
<doc>
 <arr name="author">
  <str>X</str>
 </arr>
 <str name="title">a1</str>
 <str name="journal">J0</str>
</doc>
<doc>
 <arr name="author">
  <str>X</str>
 </arr>
 <str name="title">a5</str>
 <str name="journal">J2</str>
</doc>

我的查询是

SolrServer solrServer = new HttpSolrServer(urlString);
SolrQuery query = new SolrQuery();
query.set("q", "docType:0);
query.set("fq", "author:X");
query.set("fl", "author, title, journal");
query.setRows(n);
...
QueryResponse response = solrServer.query(query);
SolrDocumentList results = response.getResults();

在我的Solr schema.xml中有以下字段和类型

<types>
    ...
    <fieldType name="text_title" class="solr.TextField"
        positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <charFilter class="solr.HTMLStripCharFilterFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1" generateNumberParts="1" catenateWords="1"
                catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
                stemEnglishPossessive="1" preserveOriginal="1" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
            <filter class="solr.KStemFilterFactory" />
            <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1" generateNumberParts="1" catenateWords="0"
                catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
                stemEnglishPossessive="1" preserveOriginal="1" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.KStemFilterFactory" />
            <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
        </analyzer>
    </fieldType>

    <fieldType name="text_name" class="solr.TextField"
        positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <charFilter class="solr.HTMLStripCharFilterFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1" generateNumberParts="1" catenateWords="1"
                catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" />
            <filter class="solr.LowerCaseFilterFactory" />
            <!-- n-grams utile per la ricerca per prefisso" -->
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
            <!-- <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> -->
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory" />
            <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1" generateNumberParts="1" catenateWords="0"
                catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
            <filter class="solr.LowerCaseFilterFactory" />
            <!-- <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> -->
        </analyzer>
    </fieldType>
</types>
<fields>
    <field name="docType" type="tint" indexed="true" stored="true"
        multiValued="false" required="true" />
    <field name="key" type="string" indexed="true" stored="true"
        multiValued="false" required="true" />
    <field name="mdate" type="date" indexed="true" stored="true"
        multiValued="false" required="true" />
    ...
    <field name="author" type="text_name" indexed="true" stored="true"
        multiValued="true" />
    ...
    <field name="journal" type="text_title" indexed="true" stored="true"
        multiValued="false" />
    <field name="title" type="text_title" indexed="true" stored="true"
        multiValued="false" />
    ...
    <field name="journal_facet" type="string" indexed="true" stored="true"
        multiValued="false" />
    ...
    <copyField dest="journal_facet" source="journal" />
    ...
</fields>

非常感谢你的帮助。

2 个答案:

答案 0 :(得分:0)

  

如何编写自定义函数查询和排序:

http://localhost:8983/solr/select?q=*:*&sort=dist(2, point1, point2) desc

<强>参考

答案 1 :(得分:0)

如果您面对结果,只需使用facet.sort获取按频率排序的方面:

https://wiki.apache.org/solr/SimpleFacetParameters#facet.sort