为什么Solr没有响应所有节点上的完整文档?

时间:2015-02-10 17:39:02

标签: solr apache-zookeeper solrcloud

TLDR:转到底部

环境

我正在研究Solr 4.10.1(solr-impl 4.10.1 1627268 - mike - 2014-09-24 06:07:51)有3个服务器和3个碎片设置。据我所知,有一个动物园管理员参与其中。

架构包含产品和价格。产品数据由一个作业插入/更新,价格字段开始不存在于提交的文档中。它们由一个只有product-id和price字段的单独作业添加。为此,设置updateRequestProcessorChain

<updateRequestProcessorChain name="versionable_chain" default="false">
    <processor class="solr.DocBasedVersionConstraintsProcessorFactory">
        <str name="versionField">price_last_generation_id</str>
        <bool name="ignoreOldUpdates">true</bool>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

云布局如下。我简化了名字。编号无关:

                  /-- box-3 (active)
       /- shard1--+-- box-1 (active leader)
       |          \-- box-2 (down)
       |
       |          /-- box-2 (down)
- foo -+- shard2--+-- box-3 (active)
       |          \-- box-1 (active leader)
       |
       |          /-- box-1 (active leader)
       \- shard3--+-- box-3 (active)
                  \-- box-2 (down)

我目前还不确定为什么box-2会显示下来,但我非常确定当该框仍处于活动状态并且我正在那里访问前端时问题已经显现。关于如何解决这个问题的评论将不胜感激。

目前的数据大小如下所示。我从所有三个方框的管理网页前端收集了信息。

box-2

foo_shard1_replica3:
  Last Modified:2 months ago (2014-11-30)
  Num Docs:234044
  Max Doc:262311
  Heap Memory Usage:1652112
  Deleted Docs:28267
  Version:5893
  Segment Count:10

foo_shard2_replica3:
  Last Modified:2 months ago (2014-11-30)
  Num Docs:303025
  Max Doc:324491
  Heap Memory Usage:1886264
  Deleted Docs:21466
  Version:7317
  Segment Count:11

foo_shard3_replica3:
  Last Modified:2 months ago (2014-11-30)
  Num Docs:349651
  Max Doc:397699
  Heap Memory Usage:1895080
  Deleted Docs:48048
  Version:8893
  Segment Count:12

box-1

foo_shard1_replica1:
  Last Modified:7 days ago
  Num Docs:299185
  Max Doc:348179
  Heap Memory Usage:1920704
  Deleted Docs:48994
  Version:23067
  Segment Count:11

foo_shard2_replica1:
  Last Modified:7 days ago
  Num Docs:379024
  Max Doc:443322
  Heap Memory Usage:2119024
  Deleted Docs:64298
  Version:26871
  Segment Count:12

foo_shard3_replica1:
  Last Modified:7 days ago
  Num Docs:373670
  Max Doc:414497
  Heap Memory Usage:2130464
  Deleted Docs:40827
  Version:29925
  Segment Count:12

box-3

foo_shard1_replica2:
  Last Modified:7 days ago
  Num Docs:299185
  Max Doc:314353
  Heap Memory Usage:1878904
  Deleted Docs:15168
  Version:22740
  Segment Count:11

foo_shard2_replica2:
  Last Modified:7 days ago      
  Num Docs:379024
  Max Doc:389958
  Heap Memory Usage:2044384
  Deleted Docs:10934
  Version:26338
  Segment Count:12

foo_shard3_replica2:
  Last Modified:7 days ago
  Num Docs:373670
  Max Doc:402724
  Heap Memory Usage:2127984
  Deleted Docs:29054
  Version:29598
  Segment Count:12

问题

当我运行特定文档的查询并返回文档ID和价格时,我有时会获得两个字段,有时只获得id。

GET http://box-2:8080/solr/foo_shard1_replica3/select?q=id%3A%22product-id%22&fl=id%2Cprice&wt=json&indent=true&debug=track

查询会产生两个不同的响应主体。这个有价格:

{
  "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 13.212859,
    "docs": [
      {
        "id": "product-id",
        "price": 174.8
      }
    ]
  },
  "facet_counts": {
    "facet_queries": {},
    "facet_fields": {},
    "facet_dates": {},
    "facet_ranges": {},
    "facet_intervals": {}
  },
  "debug": {
    "track": {
      "rid": "box-2.internal-foo_shard1_replica3-1423579755522-18",
      "EXECUTE_QUERY": {
        "http://box-3.internal:8080/solr/foo_shard2_replica2/|http://box-1.internal:8080/solr/foo_shard2_replica1/": {
        "ElapsedTime": "4",
        "RequestPurpose": "GET_TOP_IDS,GET_FACETS",
        "NumFound": "0",
        "Response": "{response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},facet_counts={facet_queries={},facet_fields={},facet_dates={},facet_ranges={},facet_intervals={}},debug={}}"
      },
      "http://box-3.internal:8080/solr/foo_shard1_replica2/|http://box-1.internal:8080/solr/foo_shard1_replica1/": {
        "ElapsedTime": "4",
        "RequestPurpose": "GET_TOP_IDS,GET_FACETS",
        "NumFound": "1",
        "Response": "{response={numFound=1,start=0,maxScore=12.965124,docs=[SolrDocument{id=product-id, score=12.965124}]},sort_values={},facet_counts={facet_queries={},facet_fields={},facet_dates={},facet_ranges={},facet_intervals={}},debug={}}"
      },
      "http://box-1.internal:8080/solr/foo_shard3_replica1/|http://box-3.internal:8080/solr/foo_shard3_replica2/": {
        "ElapsedTime": "4",
        "RequestPurpose": "GET_TOP_IDS,GET_FACETS",
        "NumFound": "1",
        "Response": "{response={numFound=1,start=0,maxScore=13.212859,docs=[SolrDocument{id=product-id, score=13.212859}]},sort_values={},facet_counts={facet_queries={},facet_fields={},facet_dates={},facet_ranges={},facet_intervals={}},debug={}}"
      }
    },
    "GET_FIELDS": {
        "http://box-3.internal:8080/solr/foo_shard1_replica2/|http://box-1.internal:8080/solr/foo_shard1_replica1/": {
          "ElapsedTime": "2",
          "RequestPurpose": "GET_FIELDS,GET_DEBUG",
          "NumFound": "1",
          "Response": "{response={numFound=1,start=0,docs=[SolrDocument{id=product-id, price=174.8}]},debug={}}"
        }
      }
    }
  }
}

这个没有:

{
  "response": {
    "numFound": 1,
    "start": 0,
    "maxScore": 13.2416725,
    "docs": [
      {
        "id": "product-id"
      }
    ]
  },
  "facet_counts": {
    "facet_queries": {},
    "facet_fields": {},
    "facet_dates": {},
    "facet_ranges": {},
    "facet_intervals": {}
  },
  "debug": {
    "track": {
      "rid": "box-2.internal-foo_shard1_replica3-1423579848055-20",
      "EXECUTE_QUERY": {
        "http://box-3.internal:8080/solr/foo_shard2_replica2/|http://box-1:8080/solr/foo_shard2_replica1/": {
          "ElapsedTime": "3",
          "RequestPurpose": "GET_TOP_IDS,GET_FACETS",
          "NumFound": "0",
          "Response": "{response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},facet_counts={facet_queries={},facet_fields={},facet_dates={},facet_ranges={},facet_intervals={}},debug={}}"
        },
        "http://box-1:8080/solr/foo_shard3_replica1/|http://box-3.internal:8080/solr/foo_shard3_replica2/": {
          "ElapsedTime": "2",
          "RequestPurpose": "GET_TOP_IDS,GET_FACETS",
          "NumFound": "1",
          "Response": "{response={numFound=1,start=0,maxScore=13.2416725,docs=[SolrDocument{id=product-id, score=13.2416725}]},sort_values={},facet_counts={facet_queries={},facet_fields={},facet_dates={},facet_ranges={},facet_intervals={}},debug={}}"
        },
        "http://box-3.internal:8080/solr/foo_shard1_replica2/|http://box-1:8080/solr/foo_shard1_replica1/": {
          "ElapsedTime": "4",
          "RequestPurpose": "GET_TOP_IDS,GET_FACETS",
          "NumFound": "1",
          "Response": "{response={numFound=1,start=0,maxScore=12.965124,docs=[SolrDocument{id=product-id, score=12.965124}]},sort_values={},facet_counts={facet_queries={},facet_fields={},facet_dates={},facet_ranges={},facet_intervals={}},debug={}}"
        }
      },
      "GET_FIELDS": {
        "http://box-1:8080/solr/foo_shard3_replica1/|http://box-3.internal:8080/solr/foo_shard3_replica2/": {
          "ElapsedTime": "2",
          "RequestPurpose": "GET_FIELDS,GET_DEBUG",
          "NumFound": "1",
          "Response": "{response={numFound=1,start=0,docs=[SolrDocument{id=product-id}]},debug={}}"
        }
      }
    }
  }
}

我做了什么

当数据插入作业正在运行时,我已经看过所有三台机器的访问日志。插入物或多或少均匀分布。所有机器都有一些。

在可以从Web前端访问的日志中,我发现了一些错误,虽然这些错误已经过了几天但之前没有发生过,尽管每隔几个小时就会有定期更新过程。

在通过box-1访问的前端:

1/31/2015, 8:14:51 AM
ERROR
StreamingSolrServers
error
org.apache.solr.common.SolrException: Internal Server Error



request: http://box-1.internal:8080/solr/foo_shard2_replica1/update?update.chain=versionable_chain&update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fbox-1.internal%3A8080%2Fsolr%2Ffoo_shard1_replica1%2F&wt=javabin&version=2
    at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

在通过box-2访问的前端:

1/31/2015, 8:15:13 AM
ERROR
StreamingSolrServers
error
org.apache.solr.common.SolrException: Internal Server Error



request: http://box-1.internal:8080/solr/foo_shard3_replica1/update?update.chain=versionable_chain&update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fbox-2.internal%3A8080%2Fsolr%2Ffoo_shard1_replica3%2F&wt=javabin&version=2
    at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

在通过box-3访问的前端:

2015年1月31日,上午8:13:02

ERROR
SolrDispatchFilter
null:org.apache.solr.common.SolrException: Internal Server Error
null:org.apache.solr.common.SolrException: Internal Server Error



request: http://box-1.internal:8080/solr/foo_shard2_replica1/update?update.chain=versionable_chain&update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fbox-3.internal%3A8080%2Fsolr%2Ffoo_shard1_replica2%2F&wt=javabin&version=2
    at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

box-1我还有一个关于更新处理器的问题。我不知道这是否相关。

1/31/2015, 8:15:05 AM
ERROR
SolrDispatchFilter
null:org.apache.solr.common.SolrException: Doc exists in index,​ but has null versionField: price_last_generation_id
null:org.apache.solr.common.SolrException: Doc exists in index, but has null versionField: price_last_generation_id
    at org.apache.solr.update.processor.DocBasedVersionConstraintsProcessorFactory$DocBasedVersionConstraintsProcessor.isVersionNewEnough(DocBasedVersionConstraintsProcessorFactory.java:328)
    at org.apache.solr.update.processor.DocBasedVersionConstraintsProcessorFactory$DocBasedVersionConstraintsProcessor.processAdd(DocBasedVersionConstraintsProcessorFactory.java:399)
    at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96)
    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166)
    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
    at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
    at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190)
    at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
    at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
    at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
    at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)
    at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

问题

在写这篇文章时我已经看到复制可能无法正常工作,这可能就是问题所在。但是,这个错误首先在复制开始分歧之前3周报告。

所以问题是:为什么有些文档在所有节点上都没有完成?

合理的后续行动:如何解决这个问题?

我继承了这个系统,我很高兴评论我可以做些什么来隔离原因。

0 个答案:

没有答案