我的MarkLogic 8数据库中有200万个文档。每个文档都有一个title元素,其字符串值由“_”,“ - ”,“。”等字符连接。但价值观不是唯一的。
我通过在Admin API中设置范围索引来创建标题的方面。之后,我的搜索查询需要大量时间来处理。我甚至试图通过使用tokenizer-class删除三个字符来创建标题的字段和字段范围索引。但仍面临同样的性能问题。有没有办法提高查询性能?
我的xml文档中的一些示例标题值
<title>04_COL_9921_Ch03_213_344.indd</title>
<title>Sch4e_fig_9_10.ai</title>
<title>10027 Separation of plasmid DNA.ai</title>
<title>MCK_03598_12_P06.psd</title>
let $options :=
<options xmlns="http://marklogic.com/appservices/search">
<search-option>unfiltered</search-option>
<term>
<term-option>case-insensitive</term-option>
</term>
<constraint name="Title">
<range collation="http://marklogic.com/collation/" facet="true">
<field ns="" name="title" />
</range>
</constraint>
<return-results>true</return-results>
<return-query>true</return-query>
</options>
let $result := search:search("**", $options, 1, 20)
return $result
字段
<field>
<field-name>title</field-name>
<field-path>
<path>content/feed/descriptive/title</path>
<weight>1.0</weight>
</field-path>
<word-lexicons/>
<included-elements/>
<excluded-elements/>
<tokenizer-overrides>
<tokenizer-override>
<character>_</character>
<tokenizer-class>remove</tokenizer-class>
</tokenizer-override>
<tokenizer-override>
<character>-</character>
<tokenizer-class>remove</tokenizer-class>
</tokenizer-override>
<tokenizer-override>
<character>.</character>
<tokenizer-class>remove</tokenizer-class>
</tokenizer-override>
</tokenizer-overrides>
<stemmed-searches>basic</stemmed-searches>
<fast-phrase-searches>true</fast-phrase-searches>
<fast-case-sensitive-searches>true</fast-case-sensitive-searches>
<fast-diacritic-sensitive-searches>true</fast-diacritic-sensitive-searches>
<three-character-searches>true</three-character-searches>
</field>
字段范围索引
<range-field-index>
<scalar-type>string</scalar-type>
<field-name>title</field-name>
<collation>http://marklogic.com/collation/</collation>
<range-value-positions>false</range-value-positions>
</range-field-index>