Question

我有 Solr 4.10.0 ，我已经为一些书籍编制了索引。模式文档是每本书的页面，因此每个文档都有诸如PageID，BookID，PageNum，Content等字段.schel.xml中的字段定义如下所示：

<field name="PageID" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 

   <field name="Content" type="text_ar" indexed="true" stored="true" required="true" termVectors="true" />
   <field name="PageNum" type="int" indexed="false" stored="true" required="false" multiValued="false" />
   <field name="Part" type="int" indexed="false" stored="true" required="false" multiValued="false" />

   <field name="BookID" type="string" indexed="true" stored="true" required="true" multiValued="false" />
   <field name="BookTitle" type="text_ar" indexed="true" stored="true" required="true" />
   <field name="BookInfo" type="text_ar" indexed="true" stored="true" required="true" />
   <field name="BookCat" type="int" indexed="false" stored="true" required="false" multiValued="false" />

问题

当我尝试搜索包含网页文字的字段Content时，我会从同一个Book获得多个结果。很明显，这是预期的，因为在书的许多页面中可以找到某个单词。我尝试像下面这样的查询进行SQL DISTINCT：

使用facet

http：//localhost:8080/solr/books/select/?q=Content:WordOfSearch&sort=PageID%20desc&version=2.2&start=0&rows=10&indent=on&wt=json&& ;小面= ON＆安培; facet.field =＆的BookID放大器; facet.limit = 1＆安培; HL =真安培; hl.q =含量：WordOfSearch

在上一个查询中，我设置facet.field=BookID以使结果只有一个结果与同一本书。但是，此解决方案无法按预期工作，并且返回结果，因为未使用facet。即使用小平面没有变化。

使用group我在使用和不使用参数main时使用了它，如下所示：

http：//localhost:8080/solr/books/select/?q=Content:WordOfSearch&sort=PageID%20desc&version=2.2&start=0&rows=10&indent=on&wt=json&&基团=真安培; group.field =＆的BookID放大器; group.main =真安培; HL =真安培; hl.fl = *＆安培; hl.simple.pre =安培; hl.simple.post = LT;％2Fspan＆GT;

group部分解决了问题。即每个书籍内容-pages-包含WordOfSearch，它返回一个结果。但是，它破坏了我在我的应用程序中所做的分页。在应用程序中，我依靠response: numFound来维护总记录。在我使用的group解决方案中，它返回numFound等于没有组的查询找到的数字。即它返回重复BookID值的文档数量，因此它会导致最后一次分页的空页面。那么，如何使用group？或任何其他解决方案来获取具有重复BookID字段值的问题的确切数字。

Answer 1

听起来您正在尝试查找包含所需关键字的网页的书籍列表。并且您不关心特定页面。

在这种情况下，您可能希望拥有代表书籍的单独文档集（而不仅仅是页面），并使用Join Query Parser进行搜索。

尝试从Solr中的搜索中获取不同的字段值

1 个答案: