我有一个庞大的postgres数据库,有2000万行,我想通过logstash将它传输到elasticsearch。我按照提到的here的建议进行测试,我测试了一个包含300行的简单数据库,所有工作都运行正常但是当我为我的主数据库测试它时,我总是错误地交叉:
nargess@nargess-Surface-Book:/usr/share/logstash/bin$ sudo ./logstash -w 1 -f students.conf --path.data /usr/share/logstash/data/students/ --path.settings /etc/logstash
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3453.hprof ...
Heap dump file created [13385912484 bytes in 53.304 secs]
Exception in thread "Ruby-0-Thread-11: /usr/share/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/thread_pool.rb:216" java.lang.ArrayIndexOutOfBoundsException: -1
at org.jruby.runtime.ThreadContext.popRubyClass(ThreadContext.java:729)
at org.jruby.runtime.ThreadContext.postYield(ThreadContext.java:1292)
at org.jruby.runtime.ContextAwareBlockBody.post(ContextAwareBlockBody.java:29)
at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:198)
at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
at org.jruby.runtime.Block.call(Block.java:101)
at org.jruby.RubyProc.call(RubyProc.java:300)
at org.jruby.RubyProc.call(RubyProc.java:230)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:103)
at java.lang.Thread.run(Thread.java:748)
The signal INT is in use by the JVM and will not work correctly on this platform
Error: Your application used more memory than the safety cap of 12G.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace
虽然我转到文件/etc/logstash/jvm.options并设置-Xms256m
-Xmx12000m
,但我还有这些错误。我有13g内存免费。如何使用此内存将数据发送到弹性搜索?
这是我在elasticsearch中使用的student-index.json
{
"aliases": {},
"warmers": {},
"mappings": {
"tab_students_dfe": {
"properties": {
"stcode": {
"type": "text"
},
"voroodi": {
"type": "integer"
},
"name": {
"type": "text"
},
"family": {
"type": "text"
},
"namp": {
"type": "text"
},
"lastupdate": {
"type": "date"
},
"picture": {
"type": "text"
},
"uniquename": {
"type": "text"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
}
}
然后我尝试通过以下方式在弹性搜索中插入此索引:
curl -XPUT --header "Content-Type: application/json"
http://localhost:9200/students -d @postgres-index.json
接下来,这是我在/usr/shar/logstash/bin/students.conf文件中的配置文件:
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
jdbc_user => "postgres"
jdbc_password => "postgres"
# The path to downloaded jdbc driver
jdbc_driver_library => "./postgresql-42.2.1.jar"
jdbc_driver_class => "org.postgresql.Driver"
# The path to the file containing the query
statement => "select * from students"
}
}
filter {
aggregate {
task_id => "%{stcode}"
code => "
map['stcode'] = event.get('stcode')
map['voroodi'] = event.get('voroodi')
map['name'] = event.get('name')
map['family'] = event.get('family')
map['namp'] = event.get('namp')
map['uniquename'] = event.get('uniquename')
event.cancel()
"
push_previous_map_as_event => true
timeout => 5
}
}
output {
elasticsearch {
document_id => "%{stcode}"
document_type => "postgres"
index => "students"
codec => "json"
hosts => ["127.0.0.1:9200"]
}
}
感谢您的帮助
答案 0 :(得分:0)
这有点旧,但我遇到了同样的问题,增加 logstash 的堆大小帮助了我。我在 docker-compose 文件中将此添加到我的 logstash 服务中:
environment:
LS_JAVA_OPTS: "-Xmx2048m -Xms2048m"
进一步阅读:What are the -Xms and -Xmx parameters when starting JVM?