如何在处理“_”,“ - ”,“。”等字符时提高查询性能。在Marklogic8中生成方面

时间:2017-02-16 08:10:59

标签: xquery marklogic marklogic-8

我的MarkLogic 8数据库中有200万个文档。每个文档都有一个title元素,其字符串值由“_”,“ - ”,“。”等字符连接。但价值观不是唯一的。

我通过在Admin API中设置范围索引来创建标题的方面。之后,我的搜索查询需要大量时间来处理。我甚至试图通过使用tokenizer-class删除三个字符来创建标题的字段和字段范围索引。但仍面临同样的性能问题。有没有办法提高查询性能?

我的xml文档中的一些示例标题值

<title>04_COL_9921_Ch03_213_344.indd</title>
<title>Sch4e_fig_9_10.ai</title>
<title>10027 Separation of plasmid DNA.ai</title>
<title>MCK_03598_12_P06.psd</title>

let $options :=
  <options xmlns="http://marklogic.com/appservices/search">
    <search-option>unfiltered</search-option>
    <term>
      <term-option>case-insensitive</term-option>
    </term>
    <constraint name="Title">
      <range collation="http://marklogic.com/collation/" facet="true">
        <field ns="" name="title" />
      </range>
    </constraint>
    <return-results>true</return-results>
    <return-query>true</return-query>
  </options>
let $result := search:search("**", $options, 1, 20)
return $result

字段

<field>
  <field-name>title</field-name>
  <field-path>
    <path>content/feed/descriptive/title</path>
    <weight>1.0</weight>
  </field-path>
  <word-lexicons/>
  <included-elements/>
  <excluded-elements/>
  <tokenizer-overrides>
    <tokenizer-override>
      <character>_</character>
      <tokenizer-class>remove</tokenizer-class>
    </tokenizer-override>
    <tokenizer-override>
      <character>-</character>
      <tokenizer-class>remove</tokenizer-class>
    </tokenizer-override>
    <tokenizer-override>
      <character>.</character>
      <tokenizer-class>remove</tokenizer-class>
    </tokenizer-override>
  </tokenizer-overrides>
  <stemmed-searches>basic</stemmed-searches>
  <fast-phrase-searches>true</fast-phrase-searches>
  <fast-case-sensitive-searches>true</fast-case-sensitive-searches>
  <fast-diacritic-sensitive-searches>true</fast-diacritic-sensitive-searches>
  <three-character-searches>true</three-character-searches>
</field>

字段范围索引

<range-field-index>
  <scalar-type>string</scalar-type>
  <field-name>title</field-name>
  <collation>http://marklogic.com/collation/</collation>
  <range-value-positions>false</range-value-positions>
</range-field-index>

0 个答案:

没有答案