Question

我使用Solr索引.html格式的某些文档，并希望将其配置为将

标记解释为文档中包含多个值的字段。这样，我希望通过搜索而不是整个文档返回特定段落。

我已经可以将设置设置为仅使用Solr CEL（ExtractingRequestHandler）捕获段落的文本，但此内容存储在单个字段中。例如：

schema.xml中：

...

    <! - Main body of document extracted by SolrCell .-->
    <field name = "content" type = "text_general" indexed = "true" stored = "true" multiValued = "true" />

...

solrconfig.xml中：

...
<str name = "xpath"> // xhtml: p / text () </ str>
...

OR

schema.xml中：

...
<field name = "p" type = "text_general" indexed = "true" stored = "true" multiValued = "true" />
...

solrconfig.xml中：

...
<str name = "capture"> p </ str>
<str name = "fmap.p"> p </ str>
...

在两者中，结果都是字符串而不是段落列表：

[
...
content: ["text of paragraphs"]
...
]

OR

 [
    ...
    p: ["text of paragraphs"]
    ...
 ]

非常感谢你！

ATT。，迭

Solr

0 个答案: