Solr查询(q)或过滤查询(fq)

时间:2012-07-24 08:57:18

标签: solr

我有一个~1 mil的产品文档Solr索引。我还有一大堆UI过滤器,如类别,标签,价格范围,尺寸,颜色和其他一些过滤器。

在fq中所有其他过滤器的同时,q选择所有(q=\*:\*)是正确的方法吗?例如:

fq=(catid:90 OR catid:81) AND priceEng:[38 TO 40] AND (size:39 OR size:40 OR size:41 OR size:50 OR size:72) AND (colorGroup:Yellow OR colorGroup:Violet OR colorGroup:Orange ... AND (companyId:81 OR companyId:691 OR companyId:671 OR companyId:628 OR companyId:185 OR companyId:602 OR ... AND endShipDays:[* TO 7])

对我来说,从类别到companyIds,从颜色和大小等等所有东西都只是过滤器。使用这种方法在未来增长中的表现有任何问题吗?我应该在q中加入一些查询,哪些查询?

谢谢,

4 个答案:

答案 0 :(得分:44)

最好尽可能使用普通查询过滤查询。

FilterQuery能够利用FilterCache,与您的查询相比,这会带来巨大的性能提升。

答案 1 :(得分:8)

I would look at the following points about a field to in order to decide:

  1. Does your field have a fixed boost score or do you need scoring for this field at all? If yes, put it in query, because as mentioned above, filter query does not use scores.
  2. Is condition for this field used frequently? If yes - again, as said before, filter cache may give huge advantage, but if no - it may be even slower.
  3. Is your index constant? This is kinda similar to #2. If your index is being updated frequently, usage of filter queries may become a bottleneck instead of giving performance boost.

Some notes about #3: In my experience I had a big index which was populated with new docs every few seconds and autoSoftCommit was set to few seconds as well. During soft commits new searcher was opened which was invalidating caches. So what was really happening, filter hit ratio was almost always 0. I can tell more: I've figured out that first filter query run is more expensive than run of a query with all those filter conditions moved to "q" instead of "fq". For example, my query took 1 second with 5 filter queries (no cache hit) and 147ms when I moved all "fq" conditions into the main query with "AND". But of course, when I stopped index updates, the same filter queries took 0ms because cache was used. So this is something to consider.

Also few other points for your question:

  • Try to never use wildcards in your query. It significantly affects performance. Therefore instead of ":" I would suggest using one condition which is less-constant-per-request (most-constant-per-request which don't need score you want to put to "fq")
  • Range searches also better to be avoided (if possible). And range searches with wildcards even more. It's about your "endShipDays:[* TO 7]". For example, using "endShipDays:(1 2 3 4 5 6 7)" would be more effective, but it's just an example, there are many ways.

Hope it helps.

答案 2 :(得分:5)

我使用 q fq 的方式。 我对 q fq 上的所有过滤器应用全文搜索。 假设您有字段关键字,您将使用copyField

在架构中定义的字段进行全文搜索
<copyField source="id" dest="keyword"/>
<copyField source="category" dest="keyword"/>
<copyField source="product_name" dest="keyword"/>
<copyField source="color" dest="keyword"/>
<copyField source="location" dest="keyword"/>
<copyField source="price" dest="keyword"/>
<copyField source="title" dest="keyword"/>
<copyField source="description" dest="keyword"/>

我的查询看起来像

/select?q={keyword}&fq=category:fashion&fq=location:nyc

/select?q=jeans&fq=category:fashion&fq=location:nyc

正如digitaljoel建议的那样,如果你需要查询多个字段,那么最好使用多个fq(参考上面的查询)而不是使用AND和OR与 q

注意:在我的情况下 q 默认是指solrconfig.xml中定义的字段 keyword

<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
     will be overridden by parameters in the request
  -->
 <lst name="defaults">
   <str name="echoParams">explicit</str>
   <int name="rows">10</int>
   <str name="df">keyword</str>
 </lst>

答案 3 :(得分:0)

考虑一下您的查询,并将不需要打分且可重复的所有内容放在fq参数中。这样,在打开搜索器之间将在Solr节点上进行的连续查询将能够重用存储在FilterCache中的信息。

过滤器缓存将唯一的过滤器存储为过滤器中的键-该值是一个位数组,其中数组的每个条目都说明给定文档是否与给定过滤器匹配。这样,很容易为下一个查询重新应用过滤器。但是,您当然会错过评分功能。

在查看查询时,我可以使用多个fq值来简化一下这些行中的内容:

fq=(catid:90 OR catid:81)
fq=priceEng:[38 TO 40]
fq=(size:39 OR size:40 OR size:41 OR size:50 OR size:72)
fq=(colorGroup:Yellow OR colorGroup:Violet OR colorGroup:Orange  ... ) 
fq=(companyId:81 OR companyId:691 OR companyId:671 OR companyId:628 OR companyId:185 OR companyId:602 OR ... ) 
fq=endShipDays:[* TO 7])

过滤器是可加的,因此查询将返回相同的结果,但至少对我而言,它更易于管理:)