在搜索过程中无法显示内容字段,即使已使用managedSchemaResourceName
中指定的资源中的curl将以下行添加到架构中。
<field name="content" stored="true" type="text_general" indexed="true"/>
我正在使用ManagedIndexSchemaFactory
中的架构。
由于ExtractRequestHandler
默认情况下已在solrconfig.xml
中定义,因此我使用ManagedIndexSchemaFactory
。我添加了内容字段行以允许在用户执行查询时显示索引内容,因为默认设置不适用于要显示的内容。我使用curl
添加如下:
$ curl -X POST -H 'Content-type:application/json' --data-binary '{
"update-field" :
{ "name":"text", "type":"text_general", "stored":true, "indexed":true, "storeOffsetsWithPositions":true}
}' http://localhost:8983/solr/collection1/schema
我使用以下命令索引了该文档:
java -Dc=collection1 -Dauto=true -jar example\exampledocs\post.jar example\exampledcos\solr-word.pdf
。
文档已成功编入索引,当我搜索内容中的任何单词时,搜索能够返回文档ID和其他信息,如主题,作者,日期等。但是,文档的内容是没有显示。
这是我从结果中得到的。
如果我没有在fl
参数中请求内容字段,那就是我得到的:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"indent": "true",
"q": "*:*",
"_": "1425362114731",
"wt": "json"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": [
{
"id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf",
"meta_save_date": [
"2008-11-13T00:00:00Z"
],
"dc_subject": [
"solr, word, pdf"
],
"subject": [
"solr word"
],
"author": [
"Grant Ingersoll"
],
"dcterms_created": [
"2008-11-13T00:00:00Z"
],
"date": [
"2008-11-13T00:00:00Z"
],
"creator": [
"Grant Ingersoll"
],
"creation_date": [
"2008-11-13T00:00:00Z"
],
"title": [
"solr-word"
],
"meta_author": [
"Grant Ingersoll"
],
"stream_content_type": [
"application/pdf"
],
"created": [
"Thu Nov 13 13:35:51 UTC 2008"
],
"stream_size": [
21052
],
"meta_keyword": [
"solr, word, pdf"
],
"cp_subject": [
"solr word"
],
"dc_format": [
"application/pdf; version=1.3"
],
"xmp_creatortool": [
"Microsoft Word"
],
"resourcename": [
"C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf"
],
"keywords": [
"solr, word, pdf"
],
"last_save_date": [
"2008-11-13T00:00:00Z"
],
"dc_title": [
"solr-word"
],
"dcterms_modified": [
"2008-11-13T00:00:00Z"
],
"meta_creation_date": [
"2008-11-13T00:00:00Z"
],
"dc_creator": [
"Grant Ingersoll"
],
"pdf_pdfversion": [
1.3
],
"last_modified": [
"2008-11-13T00:00:00Z"
],
"aapl_keywords": [
"solr, word, pdf"
],
"x_parsed_by": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.pdf.PDFParser"
],
"modified": [
"2008-11-13T00:00:00Z"
],
"xmptpg_npages": [
1
],
"pdf_encrypted": [
false
],
"producer": [
"Mac OS X 10.5.5 Quartz PDFContext"
],
"content_type": [
"application/pdf"
],
"_version_": 1494155334466404300
},
{
"id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf",
"meta_save_date": [
"2015-02-25T00:00:00Z"
],
"author": [
"GHI"
],
"dcterms_created": [
"2015-02-25T00:00:00Z"
],
"date": [
"2015-02-25T00:00:00Z"
],
"creator": [
"GHI"
],
"creation_date": [
"2015-02-25T00:00:00Z"
],
"title": [
"This is another test of PDF extraction in Solr"
],
"meta_author": [
"GHI"
],
"stream_content_type": [
"application/pdf"
],
"created": [
"Wed Feb 25 08:32:19 UTC 2015"
],
"stream_size": [
10345
],
"dc_format": [
"application/pdf; version=1.4"
],
"xmp_creatortool": [
"PDFCreator Version 1.3.2"
],
"resourcename": [
"C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf"
],
"last_save_date": [
"2015-02-25T00:00:00Z"
],
"dc_title": [
"This is another test of PDF extraction in Solr"
],
"dcterms_modified": [
"2015-02-25T00:00:00Z"
],
"meta_creation_date": [
"2015-02-25T00:00:00Z"
],
"dc_creator": [
"GHI"
],
"pdf_pdfversion": [
1.4
],
"last_modified": [
"2015-02-25T00:00:00Z"
],
"x_parsed_by": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.pdf.PDFParser"
],
"modified": [
"2015-02-25T00:00:00Z"
],
"xmptpg_npages": [
1
],
"pdf_encrypted": [
false
],
"producer": [
"GPL Ghostscript 9.05"
],
"content_type": [
"application/pdf"
],
"_version_": 1494155342991327200
}
]
}
}
如果我在fl
参数中请求内容字段,这就是我得到的。
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"fl": "content",
"indent": "true",
"q": "*:*",
"_": "1425362147661",
"wt": "json"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": [
{},
{}
]
}
}
如果我执行q=content:[* TO *]&fl=id,content
{
"responseHeader":{
"status":0,
"QTime":5,
"params":{
"fl":"id,content",
"q":"content:[* TO *]"}},
"response":{"numFound":0,"start":0,"docs":[]
}
}
我能够在Solr 4.10.1中使用它,但它在Solr 5.0中不起作用。对于Solr 5.0,我需要注意哪些内容与以前的Solr版本不同?
答案 0 :(得分:1)
我只使用Solr版本5及更高版本,但希望这会有所帮助:为了使字段可搜索,它必须是“text”类型。例如,如果您有一组字段,如:
<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
<field name="keywords" type="text_general" indexed="true" stored="true"/>
<field name="resourcename" type="text_general" indexed="true" stored="true"/>
<field name="url" type="text_general" indexed="true" stored="true"/>
<field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
并且您希望它们可以搜索,您必须将相应的副本字段添加到文本中。
<copyField source="title" dest="text"/>
<copyField source="author" dest="text"/>
<copyField source="description" dest="text"/>
<copyField source="keywords" dest="text"/>
<copyField source="content" dest="text"/>
<copyField source="content_type" dest="text"/>
<copyField source="resourcename" dest="text"/>
<copyField source="url" dest="text"/>