Question

我有一个带有几个（大）TEI xml文件的exists-db数据库，我想索引/搜索。对于索引，我有一个xmlpipe2命令调用由存在的db提供的 sphinx-out.xql url。除了实际的文本片段（段落，标题，注释等），这提供了一些我后来想要在呈现搜索结果时使用的属性。其中一个是包含html的 crumbtrail 字段（更准确地说，它包含一系列<a>超链接）。

由于我希望能够在搜索中提供句子和段落运算符，所以我设置了index_sp = 1，因为这又需要html剥离，我也有html_strip = 1。但这似乎也从我的属性中删除了html，我想保留...

以下是 sphinx.out.xql ，然后 xmlpipe2命令给出：

<sphinx:docset>
<sphinx:document id="77">
  <sphinx_docid>77</sphinx_docid>
  <sphinx_work>W0013</sphinx_work>
  <sphinx_author>Vitoria, Francisco de</sphinx_author>
  <sphinx_title>Relectiones</sphinx_title>
  <sphinx_year>1557</sphinx_year>
  <sphinx_crumbtrail>
    <span class="crumbtrail">
      <a href="/exist/apps/salamanca/work.html?wid=W0013#Vol02">Vol. 2</a>
      <span class="tokenizer"> &gt; </span>
      <a href="/exist/apps/salamanca/work.html?wid=W0013#Vol02Lect01">De augmento charitatis</a>
    </span>
  </sphinx_crumbtrail>
  <sphinx_description>
    <p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p_l3w_pml_y4">
      [SNIP]
    </p>
  </sphinx_description>
</sphinx:document>
 .
 .
 .
</sphinx:docset>

以下是sphinx给出的mysql查询：

mysql> select sphinx_docid, sphinx_work, sphinx_crumbtrail from salamanca_base;
+------+--------+--------------+-------------+---------------------------------+
| id   | weight | sphinx_docid | sphinx_work | sphinx_crumbtrail               |
+------+--------+--------------+-------------+---------------------------------+
  .
  .
  .
|   77 |      1 |           77 | W0013       | Vol. 2 > De augmento charitatis |
+------+--------+--------------+-------------+---------------------------------+
20 rows in set (0.00 sec)

现在我想知道我是否有办法禁用属性的html剥离？

任何人都可以至少确认可以在sphinx属性中存储html吗？

感谢您的任何见解

Answer 1

也许使用html_index_attrs，以便不删除跨度和a？

html_index_attrs = span = class，a = href

我怎么能不在sphinx属性中删除html？

1 个答案: