如何强制Logstash过滤器的编码? (来自消息的变音符号无法识别)

时间:2017-03-31 10:49:20

标签: elasticsearch logstash logstash-grok

我正在尝试使用Logstash(版本5.2.1)将历史日志数据导入ElasticSearch(版本5.2.2) - 所有这些都在Windows 10下运行。

示例日志文件

我导入的示例日志文件如下所示:

07.02.2017 14:16:42 - Critical - General - Ähnlicher Fehler mit übermäßger Ödnis
08.02.2017 14:13:52 - Critical - General - ästhetisch überfällige Fleißarbeit

工作配置

对于初学者,我尝试了以下简单的Logstash配置(它在Windows上运行,所以不要混淆斜线;)

input {
    file {
        path => "D:/logstash/bin/*.log"
        sincedb_path => "C:\logstash\bin\file_clientlogs_lastrun"
        ignore_older => 999999999999
        start_position => "beginning"
        stat_interval => 60
        type => "clientlogs"
    }
}
output {
    if [type] == "clientlogs" {
        elasticsearch {
            index => "logstash-clientlogs"
        }
    }
}

这很好用 - 我看到输入逐行读入我指定的索引 - 当我用Kibana检查时,例如那两行可能看起来像这样(我只是省略了主机名 - 点击进入放大)image illustrating 2 documents from kibana ui

更复杂(不工作)的配置

但当然这仍然是非常平坦的数据,我真的想从我的行和其他字段中提取正确的时间戳,并将@timestampmessage替换为那些;所以我在inputoutput之间插入了一些涉及grok - ,mutate - 和date-filter的过滤逻辑,因此生成的配置如下所示:

input {
    file {
        path => "D:/logs/*.log"
        sincedb_path => "C:\logstash\bin\file_clientlogs_lastrun"
        ignore_older => 999999999999
        start_position => "beginning"
        stat_interval => 60
        type => "clientlogs"
    }
}
filter {
  if [type] == "clientlogs" {
    grok {
        match => [ "message", "%{MONTHDAY:monthday}.%{MONTHNUM2:monthnum}.%{YEAR:year} %{TIME:time} - %{WORD:severity} - %{WORD:granularity} - %{GREEDYDATA:logmessage}" ]
    }
    mutate {
       add_field => { 
        "timestamp" => "%{year}-%{monthnum}-%{monthday} %{time}"
        }
        replace => [ "message", "%{logmessage}" ]
        remove_field => ["year", "monthnum", "monthday", "time", "logmessage"]
    }
    date {
        locale => "en"
        match => ["timestamp", "YYYY-MM-dd HH:mm:ss"]
        timezone => "Europe/Vienna"
        target => "@timestamp"
        add_field => { "debug" => "timestampMatched"}
   }
  }
}
output {
    if [type] == "clientlogs" {
        elasticsearch {
            index => "logstash-clientlogs"
        }
    }
}

现在,当我用Kibana查看这些日志时,我看到我想要添加的字段确实出现,时间戳和消息被正确替换,但我的变音符号全部消失了(点击放大)< / em>的: image illustrating 2 documents from kibana ui where umlauts have not been recognized and replaced

在输入和输出中强制charset

我也尝试过设置

codec => plain {
                charset => "UTF-8"
            }

代表inputoutput,但这也没有改变任何事情。

不同的输出类型

当我将输出更改为stdout { }时 输出似乎没问题:

2017-02-07T13:16:42.000Z MYPC Ähnlicher Fehler mit übermäßger Ödnis
2017-02-08T13:13:52.000Z MYPC ästhetisch überfällige Fleißarbeit

查询没有Kibana

我还使用此PowerShell命令查询索引:

Invoke-WebRequest –Method POST -Uri 'http://localhost:9200/logstash-clientlogs/_search' -Body '
{
  "query":
  {
    "regexp": {
      "message" : ".*" 
    }
  }
}
' | select -ExpandProperty Content

但它也会返回Kibana所揭示的同样混乱的内容:

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"logstash-clientlogs","_type":"clientlogs","_id":"AVskdTS8URonc
bfBgFwC","_score":1.0,"_source":{"severity":"Critical","debug":"timestampMatched","message":"�hnlicher Fehler mit �berm��ger �dnis\r","type":"clientlogs","path":"D:/logs/Client.log","@timestamp":"2017-02-07T13:16:42.000Z","granularity":"General","@version":"1","host":"MYPC","timestamp":"2017-02-07 14:16:42"}},{"_index":"logstash-clientlogs","_type":"clientlogs","_id":"AVskdTS8UR
oncbfBgFwD","_score":1.0,"_source":{"severity":"Critical","debug":"timestampMatched","message":"�sthetisch �berf�llige Flei�arbeit\r","type":"clientlogs","path":"D:/logs/Client.log","@timestamp":"2017-02-08T13:13:52.000Z","granularity":"General","@version":"1","host":"MYPC","timestamp":"2017-02-08 14:13:52"}}]}}

有没有其他人经历过这个并且有针对此用例的解决方案?我没有看到grok的任何设置指定任何编码(我传递的文件是UTF-8 with BOM)并且输入本身的编码似乎没有必要,因为它在我得到正确的消息时我遗漏了过滤器。

0 个答案:

没有答案