无法在Solr 5.0中显示索引内容

时间:2015-03-04 05:41:36

标签: solr

在搜索过程中无法显示内容字段,即使已使用managedSchemaResourceName中指定的资源中的curl将以下行添加到架构中。

<field name="content" stored="true" type="text_general" indexed="true"/>

我正在使用ManagedIndexSchemaFactory中的架构。

由于ExtractRequestHandler默认情况下已在solrconfig.xml中定义,因此我使用ManagedIndexSchemaFactory。我添加了内容字段行以允许在用户执行查询时显示索引内容,因为默认设置不适用于要显示的内容。我使用curl添加如下:

$ curl -X POST -H 'Content-type:application/json' --data-binary '{
"update-field" :

{ "name":"text", "type":"text_general", "stored":true, "indexed":true, "storeOffsetsWithPositions":true}

}' http://localhost:8983/solr/collection1/schema

我使用以下命令索引了该文档: java -Dc=collection1 -Dauto=true -jar example\exampledocs\post.jar example\exampledcos\solr-word.pdf

文档已成功编入索引,当我搜索内容中的任何单词时,搜索能够返回文档ID和其他信息,如主题,作者,日期等。但是,文档的内容是没有显示。

这是我从结果中得到的。

如果我没有在fl参数中请求内容字段,那就是我得到的:

{
  "responseHeader": {
    "status": 0,
    "QTime": 0,
    "params": {
      "indent": "true",
      "q": "*:*",
      "_": "1425362114731",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 2,
    "start": 0,
    "docs": [
      {
        "id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf",
        "meta_save_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_subject": [
          "solr, word, pdf"
        ],
        "subject": [
          "solr word"
        ],
        "author": [
          "Grant Ingersoll"
        ],
        "dcterms_created": [
          "2008-11-13T00:00:00Z"
        ],
        "date": [
          "2008-11-13T00:00:00Z"
        ],
        "creator": [
          "Grant Ingersoll"
        ],
        "creation_date": [
          "2008-11-13T00:00:00Z"
        ],
        "title": [
          "solr-word"
        ],
        "meta_author": [
          "Grant Ingersoll"
        ],
        "stream_content_type": [
          "application/pdf"
        ],
        "created": [
          "Thu Nov 13 13:35:51 UTC 2008"
        ],
        "stream_size": [
          21052
        ],
        "meta_keyword": [
          "solr, word, pdf"
        ],
        "cp_subject": [
          "solr word"
        ],
        "dc_format": [
          "application/pdf; version=1.3"
        ],
        "xmp_creatortool": [
          "Microsoft Word"
        ],
        "resourcename": [
          "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf"
        ],
        "keywords": [
          "solr, word, pdf"
        ],
        "last_save_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_title": [
          "solr-word"
        ],
        "dcterms_modified": [
          "2008-11-13T00:00:00Z"
        ],
        "meta_creation_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_creator": [
          "Grant Ingersoll"
        ],
        "pdf_pdfversion": [
          1.3
        ],
        "last_modified": [
          "2008-11-13T00:00:00Z"
        ],
        "aapl_keywords": [
          "solr, word, pdf"
        ],
        "x_parsed_by": [
          "org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.pdf.PDFParser"
        ],
        "modified": [
          "2008-11-13T00:00:00Z"
        ],
        "xmptpg_npages": [
          1
        ],
        "pdf_encrypted": [
          false
        ],
        "producer": [
          "Mac OS X 10.5.5 Quartz PDFContext"
        ],
        "content_type": [
          "application/pdf"
        ],
        "_version_": 1494155334466404300
      },
      {
        "id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf",
        "meta_save_date": [
          "2015-02-25T00:00:00Z"
        ],
        "author": [
          "GHI"
        ],
        "dcterms_created": [
          "2015-02-25T00:00:00Z"
        ],
        "date": [
          "2015-02-25T00:00:00Z"
        ],
        "creator": [
          "GHI"
        ],
        "creation_date": [
          "2015-02-25T00:00:00Z"
        ],
        "title": [
          "This is another test of PDF extraction in Solr"
        ],
        "meta_author": [
          "GHI"
        ],
        "stream_content_type": [
          "application/pdf"
        ],
        "created": [
          "Wed Feb 25 08:32:19 UTC 2015"
        ],
        "stream_size": [
          10345
        ],
        "dc_format": [
          "application/pdf; version=1.4"
        ],
        "xmp_creatortool": [
          "PDFCreator Version 1.3.2"
        ],
        "resourcename": [
          "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf"
        ],
        "last_save_date": [
          "2015-02-25T00:00:00Z"
        ],
        "dc_title": [
          "This is another test of PDF extraction in Solr"
        ],
        "dcterms_modified": [
          "2015-02-25T00:00:00Z"
        ],
        "meta_creation_date": [
          "2015-02-25T00:00:00Z"
        ],
        "dc_creator": [
          "GHI"
        ],
        "pdf_pdfversion": [
          1.4
        ],
        "last_modified": [
          "2015-02-25T00:00:00Z"
        ],
        "x_parsed_by": [
          "org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.pdf.PDFParser"
        ],
        "modified": [
          "2015-02-25T00:00:00Z"
        ],
        "xmptpg_npages": [
          1
        ],
        "pdf_encrypted": [
          false
        ],
        "producer": [
          "GPL Ghostscript 9.05"
        ],
        "content_type": [
          "application/pdf"
        ],
        "_version_": 1494155342991327200
      }
    ]
  }
}

如果我在fl参数中请求内容字段,这就是我得到的。

{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "fl": "content",
      "indent": "true",
      "q": "*:*",
      "_": "1425362147661",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 2,
    "start": 0,
    "docs": [
      {},
      {}
    ]
  }
}

如果我执行q=content:[* TO *]&fl=id,content

之类的查询
{
  "responseHeader":{
    "status":0,
    "QTime":5,
    "params":{
      "fl":"id,content",
      "q":"content:[* TO *]"}},
  "response":{"numFound":0,"start":0,"docs":[]
  }
}

我能够在Solr 4.10.1中使用它,但它在Solr 5.0中不起作用。对于Solr 5.0,我需要注意哪些内容与以前的Solr版本不同?

1 个答案:

答案 0 :(得分:1)

我只使用Solr版本5及更高版本,但希望这会有所帮助:为了使字段可搜索,它必须是“text”类型。例如,如果您有一组字段,如:

   <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="description" type="text_general" indexed="true" stored="true"/>
   <field name="author" type="text_general" indexed="true" stored="true"/>
   <field name="keywords" type="text_general" indexed="true" stored="true"/>
   <field name="resourcename" type="text_general" indexed="true" stored="true"/>
   <field name="url" type="text_general" indexed="true" stored="true"/>
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>

并且您希望它们可以搜索,您必须将相应的副本字段添加到文本中。

   <copyField source="title" dest="text"/>
   <copyField source="author" dest="text"/>
   <copyField source="description" dest="text"/>
   <copyField source="keywords" dest="text"/>
   <copyField source="content" dest="text"/>
   <copyField source="content_type" dest="text"/>
   <copyField source="resourcename" dest="text"/>
   <copyField source="url" dest="text"/>