Marklogic搜索:搜索包含连字符( - )的搜索词的问题

时间:2017-06-12 12:36:08

标签: search marklogic

我正在使用术语 compte-courant 进行搜索。我有包含词库条目的thesarus文件,它的值是" comptes-courants"。当我进行搜索时,搜索将返回包含" comptes"的文档。或" compte"。对于那些结果文档< search:highlight> 不可用。请帮助获取仅包含 compte-courant comptes-courants 的文档。请在下面找到随附的搜索选项和结果内容。

搜索选项

<options xmlns="http://marklogic.com/appservices/search">
<debug>false</debug>
<search-option>score-logtfidf</search-option>
<search-option>unfiltered</search-option>
<term>
    <term-option>case-insensitive</term-option>
    <term-option>diacritic-insensitive</term-option>
    <term-option>punctuation-insensitive</term-option>
</term>
<quality-weight>5.0</quality-weight>
<return-constraints>false</return-constraints>
<return-facets>true</return-facets>
<return-qtext>true</return-qtext>
<return-query>false</return-query>
<return-results>true</return-results>
<return-metrics>true</return-metrics>
<return-similar>false</return-similar>
<transform-results apply="src-snippet" ns="/src-snippet"
    at="/src-snippet.xqy">
    <per-match-tokens>30</per-match-tokens>
    <max-matches>4</max-matches>
    <max-snippet-chars>200</max-snippet-chars>
    <preferred-elements />
</transform-results>
<additional-query>
    <cts:and-query xmlns:cts="http://marklogic.com/cts">
        <cts:directory-query depth="infinity">
            <cts:uri>/DOCS/</cts:uri>
        </cts:directory-query>
        <cts:word-query>
            <cts:text xml:lang="en">compte-courant</cts:text>
            <cts:text xml:lang="en">comptes-courants</cts:text>
        </cts:word-query>
    </cts:and-query>
</additional-query>
<sort-order type="xs:date" direction="descending">
    <element ns="" name="sortabledate" />
    <annotation>Sort by Date</annotation>
</sort-order>
<sort-order direction="descending">
    <score />
</sort-order>
<grammar>
    <starter strength="40" apply="grouping" delimiter=")">(</starter>
    <starter strength="10" apply="prefix" element="cts:not-query">-</starter>
    <joiner strength="30" apply="infix" element="cts:and-query"
        tokenize="word">AND</joiner>
    <joiner strength="20" apply="infix" element="cts:or-query"
        tokenize="word">OR</joiner>
</grammar>

搜索:搜索(&#34;&#34;,$ searchOptions,&#34; 1&#34;,&#34; 4&#34;)返回以下结果。

结果

<search:response snippet-format="src-snippet" total="640"
start="1" page-length="10" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="" xmlns:search="http://marklogic.com/appservices/search">
<search:result index="1" uri="/DOCS/JK_KAJD-10194_MAR03.xml"
    path="fn:doc(&quot;/DOCS/JK_KAJD-10194_MAR03.xml&quot;)" score="61440"
    confidence="0.363169" fitness="0.534633">
    <search:snippet>
        <src-term />
        <tool-tip>
            <search:match
                path="fn:doc(&quot;/DOCS/JK_KAJD-10194_MAR03.xml&quot;)/CASEDOC">Cour de justice de l'Union européenne, 4e chambre, 10 Novembre
                2016 - n° C-156/15 JK_KAJD-10194_MAR03
            </search:match>
        </tool-tip>
        <title>Cour de justice de l'Union européenne, 4e chambre, 10 Novembre
            2016 - n° C-156/15
        </title>

    </search:snippet>
</search:result>
<search:result index="2" uri="/DOCS/JP_KID-630822_MAR03.xml"
    path="fn:doc(&quot;/DOCS/JP_KID-630822_MAR03.xml&quot;)" score="61440"
    confidence="0.363169" fitness="0.534633">
    <search:snippet>
        <src-term />
        <tool-tip>
            <search:match
                path="fn:doc(&quot;/DOCS/JP_KID-630822_MAR03.xml&quot;)/CASEDOC">Conseil d'État, 3e sous-section, 16 Juillet 2015 - n° 388760
                JP_KID-630822_MAR03
            </search:match>
        </tool-tip>
        <title>Conseil d'État, 3e sous-section, 16 Juillet 2015 - n° 388760
        </title>

    </search:snippet>
</search:result>
<search:result index="3" uri="/DOCS/JP_KICA-0031257_MAR03.xml"
    path="fn:doc(&quot;/DOCS/JP_KICA-0031257_MAR03.xml&quot;)" score="98304"
    confidence="0.459376" fitness="0.676263">
    <search:snippet>
        <src-term>compte-courant</src-term>
        <tool-tip>
            <search:match
                path="fn:doc(&quot;/DOCS/JP_KICA-0031257_MAR03.xml&quot;)/CASEDOC/*:body/*:content/*:judgments/*:judgment/*:judgmentbody/*:considerations[2]/p[13]/text">
                ALORS QUE, D'AUTRE PART, en considérant que l'enregistrement des
                redevances sur un
                <search:highlight>compte-courant</search:highlight>
                personnel valait mise à disposition des redevances de la location
                gérance à M. X....
            </search:match>
        </tool-tip>
        <title>Cour de cassation, 2e chambre civile, 9 Juillet 2015 – n°
            14-21.758
        </title>
    </search:snippet>
</search:result>
<search:result index="4" uri="/DOCS/JP_KASS-0007470_MAR03.xml"
    path="fn:doc(&quot;/DOCS/JP_KASS-0007470_MAR03.xml&quot;)" score="98304"
    confidence="0.459376" fitness="0.676263">
    <search:snippet>
        <src-term>compte-courant</src-term>
        <tool-tip>
            <search:match>
                ALORS QUE, D'AUTRE PART, en considérant que l'enregistrement des
                redevances sur un
                <search:highlight>compte-courant</search:highlight>
                personnel valait mise à disposition des redevances de la location
                gérance à M. X....
            </search:match>
        </tool-tip>
        <title>Cour de cassation, 2e chambre civile, 9 Juillet 2015 – n°
            14-21.755
        </title>
    </search:snippet>
</search:result>
<search:qtext />
<search:metrics>
    <search:query-resolution-time>PT0S</search:query-resolution-time>
    <search:facet-resolution-time>PT0S</search:facet-resolution-time>
    <search:snippet-resolution-time>PT0.672S
    </search:snippet-resolution-time>
    <search:total-time>PT0.672S</search:total-time>
</search:metrics>

1 个答案:

答案 0 :(得分:0)

我认为您遇到的问题是默认搜索语法将连字符视为&#34;而不是&#34;所以它正在寻找&#34; compte&#34;或&#34; comptes&#34; 没有观众。这可能是一个不同的问题,但我知道我们在最近的项目中有这个...尝试为您的选项添加不同的搜索语法。你也可以将你的搜索字符串包装在引号中,这样它就是字面值,但是你会丢失&#34;模糊&#34;匹配。

如果根本原因是我认为的那样,这里的语法应该适用于此:

    <grammar xmlns="http://marklogic.com/appservices/search">
        <quotation>"</quotation>
        <implicit>
        <cts:and-query strength="20" xmlns:cts="http://marklogic.com/cts"/>
        </implicit>
        <starter strength="30" apply="grouping" delimiter=")">(</starter>
        <joiner strength="10" apply="infix" element="cts:or-query"
        tokenize="word">OR</joiner>
        <joiner strength="20" apply="infix" element="cts:and-query"
        tokenize="word">AND</joiner>
        <joiner strength="30" apply="infix" element="cts:near-query"
        tokenize="word">NEAR</joiner>
        <joiner strength="30" apply="near2" consume="2"
        element="cts:near-query">NEAR/</joiner>
        <joiner strength="32" apply="boost" element="cts:boost-query"
        tokenize="word">BOOST</joiner>
        <joiner strength="35" apply="not-in" element="cts:not-in-query"
        tokenize="word">NOT_IN</joiner>
        <joiner strength="50" apply="constraint">:</joiner>
        <joiner strength="50" apply="constraint" compare="LT"
        tokenize="word">LT</joiner>
        <joiner strength="50" apply="constraint" compare="LE"
        tokenize="word">LE</joiner>
        <joiner strength="50" apply="constraint" compare="GT"
        tokenize="word">GT</joiner>
        <joiner strength="50" apply="constraint" compare="GE"
        tokenize="word">GE</joiner>
        <joiner strength="50" apply="constraint" compare="NE"
        tokenize="word">NE</joiner>
    </grammar>