Question

我有一个庞大的postgres数据库，有2000万行，我想通过logstash将它传输到elasticsearch。我按照提到的here的建议进行测试，我测试了一个包含300行的简单数据库，所有工作都运行正常但是当我为我的主数据库测试它时，我总是错误地交叉：

nargess@nargess-Surface-Book:/usr/share/logstash/bin$  sudo ./logstash -w 1 -f students.conf --path.data /usr/share/logstash/data/students/ --path.settings /etc/logstash
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid3453.hprof ...
Heap dump file created [13385912484 bytes in 53.304 secs]
Exception in thread "Ruby-0-Thread-11: /usr/share/logstash/vendor/bundle/jruby/1.9/gems/puma-2.16.0-java/lib/puma/thread_pool.rb:216"  java.lang.ArrayIndexOutOfBoundsException: -1
at org.jruby.runtime.ThreadContext.popRubyClass(ThreadContext.java:729)
at org.jruby.runtime.ThreadContext.postYield(ThreadContext.java:1292)
at org.jruby.runtime.ContextAwareBlockBody.post(ContextAwareBlockBody.java:29)
at org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:198)
at org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
at org.jruby.runtime.Block.call(Block.java:101)
at org.jruby.RubyProc.call(RubyProc.java:300)
at org.jruby.RubyProc.call(RubyProc.java:230)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:103)
at java.lang.Thread.run(Thread.java:748)
The signal INT is in use by the JVM and will not work correctly on this platform
Error: Your application used more memory than the safety cap of 12G.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace

虽然我转到文件/etc/logstash/jvm.options并设置-Xms256m -Xmx12000m，但我还有这些错误。我有13g内存免费。如何使用此内存将数据发送到弹性搜索？这是我在elasticsearch中使用的student-index.json

{
"aliases": {},
"warmers": {},
"mappings": {
    "tab_students_dfe": {
        "properties": {
            "stcode": {
                "type": "text"
            },
            "voroodi": {
                "type": "integer"
            },
            "name": {
                "type": "text"
            },
            "family": {
                "type": "text"
            },
            "namp": {
                "type": "text"
            },
            "lastupdate": {
                "type": "date"
            },
            "picture": {
                "type": "text"
            },
            "uniquename": {
                "type": "text"
            }
        }
    }
},
"settings": {
    "index": {
        "number_of_shards": "5",
        "number_of_replicas": "1"
    }
}
}

然后我尝试通过以下方式在弹性搜索中插入此索引：

curl -XPUT --header "Content-Type: application/json" 

http://localhost:9200/students -d @postgres-index.json

接下来，这是我在/usr/shar/logstash/bin/students.conf文件中的配置文件：

input {
  jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
jdbc_user => "postgres"
jdbc_password => "postgres"
# The path to downloaded jdbc driver
jdbc_driver_library => "./postgresql-42.2.1.jar"
jdbc_driver_class => "org.postgresql.Driver"
# The path to the file containing the query
statement => "select * from students"
  }
}
filter {
  aggregate {
    task_id => "%{stcode}"
code => "
  map['stcode'] = event.get('stcode')
  map['voroodi'] = event.get('voroodi')
  map['name'] = event.get('name')
  map['family'] = event.get('family')
  map['namp'] = event.get('namp')
  map['uniquename'] = event.get('uniquename')
  event.cancel()
"
push_previous_map_as_event => true
timeout => 5
  }
}
output {
   elasticsearch {
document_id => "%{stcode}"
document_type => "postgres"
index => "students"
codec => "json"
hosts => ["127.0.0.1:9200"]
  }
}

感谢您的帮助

Answer 1

这有点旧，但我遇到了同样的问题，增加 logstash 的堆大小帮助了我。我在 docker-compose 文件中将此添加到我的 logstash 服务中：

environment:
  LS_JAVA_OPTS: "-Xmx2048m -Xms2048m"

进一步阅读：What are the -Xms and -Xmx parameters when starting JVM?

java.lang.OutOfMemoryError：通过logstash

1 个答案: