Question

我有以下文件：

doc1
    description: "A doggo is a small dog."
doc2
    description: "My dog is small.
doc3
    description: "My cat is lazy."

我使用以下查询搜索文档：

description:*dog* OR small

返回文档：doc1和doc2

现在，我想获取查询中每个单词的词频总数。为此，我尝试使用termfreq()函数。

termfreq(description, *dog*)
termfreq(description, small)

结果将如下所示：

doc1
    description: "A doggo is a small dog."
    termfreq(description,*dog*): 0
    termfreq(description, small): 1
doc2
    description: "My dog is small.
    termfreq(description, *dog*): 0
    termfreq(description, small): 1

否则结果应如下所示：

doc1
    description: "A doggo is a small dog."
    termfreq(description, *dog*): 2
    termfreq(description, small): 1
doc2
    description: "My dog is small.
    termfreq(description, *dog*): 1
    termfreq(description, small): 1

我的问题：是否可以在termfreq函数中使用通配符？

如果yes：如何？
如果no：是否有一种方法可以获取部分单词查询的词频？

编辑：

托管模式

<fieldType name="descriptionNGram" class="solr.TextField" omitNorms="false">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory" />
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="30"/>
    </analyzer>
  </fieldType>

<field name="description" stored="true" type="descriptionNGram" multiValued="false" indexed="true"/>

Answer 1

如果您可以在没有前缀通配符的情况下生活，则可以use the TermsComponent并将terms.lower设置为令牌以开始迭代。

如果需要前缀通配符，则必须对NGrams进行索引，以使您的单词得到one token per combination of letters。因此，对于doggo，您将获得do，og，gg，go等令牌。

Solr 8.1.0-获取部分单词的词频

1 个答案: