Question

我在一个目录中有大约100万个xml静态文件。我想用logstash读取和解析这些文件并输出到elasticsearch。我有下一个输入配置（我尝试很多方式，它是我的最后一个版本）：

input{
 file {
               path => "/opt/lun/data-unzip/ftp/223/*.xml*"
               exclude => "*.zip"
               type => "223-purplan"
               start_position => beginning
               discover_interval => "3"
               max_open_files => "128"
                close_older => "3"
                codec => multiline {
                        pattern => "xml version"
                        negate => true
                        what => "previous"
                        max_lines => "9999"
                        max_bytes => "100 MiB"
                }
       }
}

我的服务器使用CentOS 6.8和下一个硬件： 80G内存英特尔（R）Xeon（R）CPU E5620 @ 2.40GHz 与16 cpu`s

在此服务器中安装Logstash（5.1.2）和elasticsearch（5.1.2）。

此配置工作非常慢 - 大约每秒4个文件

如何更快速地解析呢？

Answer 1

有几种方法可以增加logstash的处理能力，但是很难指出哪一种方法应该完成。{1}}也许你可以尝试增加*pipeline.workers, pipeline.batch.size, and pipeline.batch.delay*的大小来调整pipeline performance。

AND 为了快速诊断和解决Logstash性能问题，很少有troubleshooting种方法。您还可以通过删除所有过滤器来尝试优化您的输入，然后再将所有文档发送到 / dev / null ，以确保没有处理或输出文件的瓶颈。

尝试将此行添加到file：

sincedb_path => "/dev/null"

您可能还想查看Tuning and Profiling Logstash Performance＆amp; this博文。希望能帮助到你！

Logstash读取了大量的静态xml文件（输入文件插件）

1 个答案: