Question

我尝试使用Windows版本的帖子索引文档，使用像bellow：

这样的命令

java -Dc=docs -Dauto=yes -Dc=docs -Ddata=files -Drecursive=yes -jar
post.jar C:\docs

我可以看到文档已正确索引，但我想存储提取的文本以使用突出显示。我添加到我的托管架构字段，如：

<field name="text" type="text_general" multiValued="true" indexed="true" stored="true"/>
<field name="source" type="text_general" multiValued="true" indexed="true" stored="true"/>
<field name="content" type="text_general" multiValued="true" indexed="true" stored="true"/>
<field name="content" type="strings"/>

但它不起作用，我无法返回我的文档搜索内容。如何存储从doc，docx，pdf文件中提取的文本并将其返回到我的查询中？

Answer 1

post.jar将执行索引操作。因此，当您索引任何文档（有一个选项可以设置为true / false以在schema.xml文件中存储内容）时，您可以搜索其内容。

只有在存储时才能使用高亮显示。

检查此Link以了解索引，搜索是如何完成的

Answer 2

bin / post （不确定post.jar，但我也相信）会告诉你它确定每个文件的类型以及提交的处理程序。

例如，MSWord，PDF等都会转到 / extract 处理程序，该处理程序使用Tika提取内容。

然后，如果您在 solrconfig.xml 中查找 / extract 处理程序的定义，您将看到告诉您如何映射提取的内容的参数，其中包括字段的名称。然后，您可以存储这些字段并重新编制索引。

如何在Solr 6.4中存储文档内容？

2 个答案: