Question

我正面临一个奇怪的问题，使用 Django Haystack 和 ElasticSearch ，阻止我对内容编制索引。

ElasticSearch正在机器上正常运行：

# curl -X GET 127.0.0.1:9200
{
  "status" : 200,
  "name" : "White Pilgrim",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.7.4",
    "build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e",
    "build_timestamp" : "2015-12-15T11:25:18Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

但是当我尝试使用Haystack构建我的索引时：

# python manage.py rebuild_index

我收到ProtocolError(('Connection aborted.', error(111, 'Connection refused')))错误：

ERROR:root:Error updating auth using default
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 188, in handle_label
    self.update_backend(label, using)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 233, in update_backend
    do_update(backend, index, qs, start, end, total, verbosity=self.verbosity, commit=self.commit)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 96, in do_update
    backend.update(index, current_qs, commit=commit)
  File "/usr/local/lib/python2.7/dist-packages/haystack/backends/elasticsearch_backend.py", line 193, in update
    bulk(self.conn, prepped_docs, index=self.index_name, doc_type='modelresult')
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 188, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 160, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 89, in _process_bulk_chunk
    raise e
ConnectionError: ConnectionError(('Connection aborted.', error(111, 'Connection refused'))) caused by: ProtocolError(('Connection aborted.', error(111, 'Connection refused')))
Traceback (most recent call last):
  File "manage.py", line 11, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 345, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 348, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 399, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/rebuild_index.py", line 26, in handle
    call_command('update_index', **options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 119, in call_command
    return command.execute(*args, **defaults)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 399, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 183, in handle
    return super(Command, self).handle(*items, **options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 548, in handle
    label_output = self.handle_label(label, **options)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 188, in handle_label
    self.update_backend(label, using)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 233, in update_backend
    do_update(backend, index, qs, start, end, total, verbosity=self.verbosity, commit=self.commit)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 96, in do_update
    backend.update(index, current_qs, commit=commit)
  File "/usr/local/lib/python2.7/dist-packages/haystack/backends/elasticsearch_backend.py", line 193, in update
    bulk(self.conn, prepped_docs, index=self.index_name, doc_type='modelresult')
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 188, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 160, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 89, in _process_bulk_chunk
    raise e
elasticsearch.exceptions.ConnectionError: ConnectionError(('Connection aborted.', error(111, 'Connection refused'))) caused by: ProtocolError(('Connection aborted.', error(111, 'Connection refused')))

奇怪的是：到了这个时候（错误之后），弹性搜索仍然无法访问，甚至是我之前做过的方式：

# curl -v -X GET 127.0.0.1:9200
* Rebuilt URL to: 127.0.0.1:9200/
* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* connect to 127.0.0.1 port 9200 failed: Connexion refusée
* Failed to connect to 127.0.0.1 port 9200: Connexion refusée
* Closing connection 0
curl: (7) Failed to connect to 127.0.0.1 port 9200: Connexion refusée

如果我希望它再次启动并运行，我需要重新启动它：

# sudo service elasticsearch restart

这发生在Raspbian发行版上。在我的MacOX开发机器上，一切正常，索引效果很好。

你有什么想法吗？

记忆信息

$ free

             total       used       free     shared    buffers     cached
Mem:        883148     667632     215516      17312      61468     421536
-/+ buffers/cache:     184628     698520
Swap:       102396          0     102396

$ cat /proc/meminfo

MemTotal:         883148 kB
MemFree:          215492 kB
MemAvailable:     675152 kB
Buffers:           61472 kB
Cached:           421536 kB
SwapCached:            0 kB
Active:           366556 kB
Inactive:         253416 kB
Active(anon):     128720 kB
Inactive(anon):    25556 kB
Active(file):     237836 kB
Inactive(file):   227860 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        102396 kB
SwapFree:         102396 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        136864 kB
Mapped:            30956 kB
Shmem:             17312 kB
Slab:              33560 kB
SReclaimable:      24684 kB
SUnreclaim:         8876 kB
KernelStack:        1696 kB
PageTables:         1948 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      543968 kB
Committed_AS:    1287328 kB
VmallocTotal:    1171456 kB
VmallocUsed:        1140 kB
VmallocChunk:     965220 kB
CmaTotal:           8192 kB
CmaFree:              36 kB

$ vmstat -s

       883148 K total memory
       667624 K used memory
       366456 K active memory
       253416 K inactive memory
       215524 K free memory
        61476 K buffer memory
       421536 K swap cache
       102396 K total swap
            0 K used swap
       102396 K free swap
       764663 non-nice user cpu ticks
            0 nice user cpu ticks
       403552 system cpu ticks
    592012076 idle cpu ticks
       394292 IO-wait cpu ticks
           96 IRQ cpu ticks
         8609 softirq cpu ticks
            0 stolen cpu ticks
      3905593 pages paged in
     13767628 pages paged out
            0 pages swapped in
            0 pages swapped out
    503949992 interrupts
    198893627 CPU context switches
   1460307293 boot time
        61556 forks

通过原始Python访问ES

我还尝试通过原始Python（没有Haystack）访问ES。它似乎有效：

$ sudo service elasticsearch restart
$ python

Python 2.7.9 (default, Mar  8 2015, 00:52:26)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
>>> es
<Elasticsearch([{'host': 'localhost', 'port': 9200}])>
>>> es.get(index='myindex', doc_type='topic', id=1)
elasticsearch.exceptions.NotFoundError: TransportError(404, u'{"_index":"myindex","_type":"topic","_id":"1","found":false}')

好吧，它没有找到任何东西（因为没有索引，嘿！），但它似乎能够查询引擎。

ES日志

默认日志文件并没有告诉我任何有用的信息：

2016-04-25 17:38:14,503][INFO ][node                     ] [White Pilgrim] version[1.7.4], pid[23455], build[0d3159b/2015-12-15T11:25:18Z]
[2016-04-25 17:38:14,506][INFO ][node                     ] [White Pilgrim] initializing ...
[2016-04-25 17:38:15,208][INFO ][plugins                  ] [White Pilgrim] loaded [], sites []
[2016-04-25 17:38:15,542][INFO ][env                      ] [White Pilgrim] using [1] data paths, mounts [[/ (/dev/root)]], net usable_space [8.6gb], net total_space [14.4gb], types [ext4]
[2016-04-25 17:38:32,689][INFO ][node                     ] [White Pilgrim] initialized
[2016-04-25 17:38:32,691][INFO ][node                     ] [White Pilgrim] starting ...
[2016-04-25 17:38:33,241][INFO ][transport                ] [White Pilgrim] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.50:9300]}
[2016-04-25 17:38:33,365][INFO ][discovery                ] [White Pilgrim] elasticsearch/NWl6pT-_QemJtTvKYkSGCQ
[2016-04-25 17:38:37,286][INFO ][cluster.service          ] [White Pilgrim] new_master [White Pilgrim][NWl6pT-_QemJtTvKYkSGCQ][raspberrypi][inet[/192.168.1.50:9300]], reason: zen-disco-join (elected_as_master)
[2016-04-25 17:38:37,449][INFO ][http                     ] [White Pilgrim] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.50:9200]}
[2016-04-25 17:38:37,451][INFO ][node                     ] [White Pilgrim] started
[2016-04-25 17:38:37,664][INFO ][gateway                  ] [White Pilgrim] recovered [1] indices into cluster_state
[2016-04-25 17:47:28,176][INFO ][cluster.metadata         ] [White Pilgrim] [myindex] deleting index
[2016-04-25 17:47:28,964][INFO ][cluster.metadata         ] [White Pilgrim] [myindex] creating index, cause [api], templates [], shards [5]/[1], mappings []
[2016-04-25 17:47:30,412][INFO ][cluster.metadata         ] [White Pilgrim] [myindex] create_mapping [modelresult]

所以我尝试获取更多信息，编辑/etc/elasticsearch/logging.yml配置文件，并询问TRACE级别（es.logger.level: TRACE）。

即使它崩溃了，ES在其日志文件中看起来仍然可以正常运行！

扩展日志（最后一行，在ES崩溃后复制）：

[2016-04-28 11:15:24,817][DEBUG][cluster.service          ] [Franklin Hall] publishing cluster state version 11
[2016-04-28 11:15:24,818][DEBUG][cluster.service          ] [Franklin Hall] set local cluster state to version 11
[2016-04-28 11:15:24,819][TRACE][cluster.routing          ] [Franklin Hall] no need to schedule reroute due to delayed unassigned, next_delay_setting [0], registered [9223372036854775807]
[2016-04-28 11:15:24,827][DEBUG][river.cluster            ] [Franklin Hall] processing [reroute_rivers_node_changed]: execute
[2016-04-28 11:15:24,827][DEBUG][river.cluster            ] [Franklin Hall] processing [reroute_rivers_node_changed]: no change in cluster_state
[2016-04-28 11:15:24,827][TRACE][gateway.local.state.meta ] [Franklin Hall] [posepartage] writing state, reason [version changed from [1] to [2]]
[2016-04-28 11:15:24,849][TRACE][cluster.service          ] ack received from node [[Franklin Hall][cOT-S342Q7OMtv4z2ZkS9Q][raspberrypi][inet[/192.168.1.50:9300]]], cluster_state update (version: 11)
[2016-04-28 11:15:24,850][TRACE][cluster.service          ] all expected nodes acknowledged cluster_state update (version: 11)
[2016-04-28 11:15:24,852][DEBUG][cluster.service          ] [Franklin Hall] processing [put-mapping [modelresult]]: took 240ms done applying updated cluster_state (version: 11)
[2016-04-28 11:15:24,852][TRACE][indices.breaker          ] [Franklin Hall] [REQUEST] Adjusted breaker by [16432] bytes, now [16432]
[2016-04-28 11:15:24,863][TRACE][indices.breaker          ] [Franklin Hall] [REQUEST] Adjusted breaker by [-16432] bytes, now [0]
[2016-04-28 11:15:25,427][TRACE][index.shard              ] [Franklin Hall] [myindex][2] index [modelresult][auth.user.1][org.elasticsearch.index.mapper.ParseContext$Document@1108529]
[2016-04-28 11:15:25,427][TRACE][index.shard              ] [Franklin Hall] [myindex][0] index [modelresult][auth.user.3][org.elasticsearch.index.mapper.ParseContext$Document@1108529]
[2016-04-28 11:15:25,427][TRACE][index.shard              ] [Franklin Hall] [myindex][1] index [modelresult][auth.user.2][org.elasticsearch.index.mapper.ParseContext$Document@1108529]

系统日志

我还试图探索系统日志，以获取有关ES崩溃的信息（因为它似乎没有自己记录关于这一点的任何内容......）。

/var/log/daemon.log：

Apr 28 11:37:48 raspberrypi systemd[1]: Starting Elasticsearch...
Apr 28 11:37:48 raspberrypi systemd[1]: Started Elasticsearch.
Apr 28 11:38:52 raspberrypi systemd[1]: elasticsearch.service: main process exited, code=killed, status=6/ABRT
Apr 28 11:38:52 raspberrypi systemd[1]: Unit elasticsearch.service entered failed state.

/var/log/messages：

Apr 28 11:38:52 raspberrypi rsyslogd-2007: action 'action 17' suspended, next retry is Thu Apr 28 11:39:52 2016 [try http://www.rsyslog.com/e/2007 ]

哎呀！

更进一步，我在StandardOutput=null中将StandardOutput=journal更改为/usr/lib/systemd/system/elasticsearch.service，执行了systemctl daemon-reload，然后journalctl -u elasticsearch.service -f。这样，我就错误获得了更多细节：

avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:04,569][TRACE][indices.breaker          ] [Supreme Intelligence] [REQUEST] Adjusted breaker by [-16432] bytes, now [0]
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,337][TRACE][index.shard              ] [Supreme Intelligence] [myindex][2] index [modelresult][auth.user.1][org.elasticsearch.index.mapper.ParseContext$Document@a2c034]
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,337][TRACE][index.shard              ] [Supreme Intelligence] [myindex][1] index [modelresult][auth.user.2][org.elasticsearch.index.mapper.ParseContext$Document@1fcee4a]
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,337][TRACE][index.shard              ] [Supreme Intelligence] [myindex][0] index [modelresult][auth.user.3][org.elasticsearch.index.mapper.ParseContext$Document@80719c]
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,488][TRACE][index.engine.lucene.iw   ] [Supreme Intelligence] [myindex][1] elasticsearch[Supreme Intelligence][scheduler][T#1] IW: nrtIsCurrent: infoVersion matches: false; DW changes: false; BD changes: false
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,495][TRACE][index.engine.lucene.iw   ] [Supreme Intelligence] [myindex][1] elasticsearch[Supreme Intelligence][refresh][T#1] IW: nrtIsCurrent: infoVersion matches: false; DW changes: false; BD changes: false
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,496][TRACE][index.shard              ] [Supreme Intelligence] [myindex][1] refresh with source: schedule
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,500][TRACE][index.engine.lucene.iw   ] [Supreme Intelligence] [myindex][1] elasticsearch[Supreme Intelligence][refresh][T#1] IW: nrtIsCurrent: infoVersion matches: false; DW changes: false; BD changes: false
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,503][TRACE][index.engine.lucene.iw   ] [Supreme Intelligence] [myindex][1] elasticsearch[Supreme Intelligence][refresh][T#1] IW: flush at getReader
avril 28 12:09:05 raspberrypi elasticsearch[3481]: [2016-04-28 12:09:05,504][TRACE][index.engine.lucene.iw   ] [Supreme Intelligence] [myindex][1] elasticsearch[Supreme Intelligence][refresh][T#1] DW: startFullFlush
avril 28 12:09:05 raspberrypi elasticsearch[3481]: Core dumps have been disabled. To enCore dumps have been disabled. To en#
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # A fatal error has been detected by the Java Runtime Environment:
avril 28 12:09:05 raspberrypi elasticsearch[3481]: #
avril 28 12:09:05 raspberrypi elasticsearch[3481]: #  Internal Error (cppInterpreter_arm.S:2625), pid=3481, tid=625996896
avril 28 12:09:05 raspberrypi elasticsearch[3481]: #  fatal error:         *** Unimplemented opcode: 232 = <unknown>
avril 28 12:09:05 raspberrypi elasticsearch[3481]: #
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # JRE version: OpenJDK Runtime Environment (7.0_95) (build 1.7.0_95-b00)
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # Java VM: OpenJDK Zero VM (24.95-b01 mixed mode linux-arm )
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # Derivative: IcedTea 2.6.4
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # Distribution: Raspbian GNU/Linux 8.0 (jessie), package 7u95-2.6.4-1~deb8u1+rpi1
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
avril 28 12:09:05 raspberrypi elasticsearch[3481]: #
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # An error report file with more information is saved as:
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # /tmp/hs_err_pid3481.log
avril 28 12:09:05 raspberrypi elasticsearch[3481]: #
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # If you would like to submit a bug report, please include
avril 28 12:09:05 raspberrypi elasticsearch[3481]: # instructions on how to reproduce the bug and visit:
avril 28 12:09:05 raspberrypi elasticsearch[3481]: #   http://icedtea.classpath.org/bugzilla
avril 28 12:09:05 raspberrypi elasticsearch[3481]: #
avril 28 12:09:05 raspberrypi systemd[1]: elasticsearch.service: main process exited, code=killed, status=6/ABRT
avril 28 12:09:05 raspberrypi systemd[1]: Unit elasticsearch.service entered failed state.

Answer 1

elasticsearch停止侦听某个端口这一事实证明了问题所在（它可能不是根本原因，但它是当前的领先优势）。您需要get more info about what is happening there。

Answer 2

尝试将open-jdk更新为上一版本。你能从/tmp/hs_err_pid3481.log显示日志吗？

Answer 3

# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (cppInterpreter_arm.S:2625), pid=3481, tid=625996896
#  fatal error:         *** Unimplemented opcode: 232 = <unknown>
#
# JRE version: OpenJDK Runtime Environment (7.0_95) (build 1.7.0_95-b00)
# Java VM: OpenJDK Zero VM (24.95-b01 mixed mode linux-arm )
# Derivative: IcedTea 2.6.4
# Distribution: Raspbian GNU/Linux 8.0 (jessie), package 7u95-2.6.4-1~deb8u1+rpi1
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid3481.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#

这似乎是openjdk 1.7中的一个错误。有些人遇到了同样的问题：https://discuss.elastic.co/t/openjdk-on-arm-crashes-elastic/27283

解决方案是升级你的jvm。

无法使用Django Haystack重新构建弹性搜索：'连接被拒绝'

记忆信息

通过原始Python访问ES

ES日志

系统日志

3 个答案: