我正在尝试使用Logstash(版本5.2.1)将历史日志数据导入ElasticSearch(版本5.2.2) - 所有这些都在Windows 10下运行。
我导入的示例日志文件如下所示:
07.02.2017 14:16:42 - Critical - General - Ähnlicher Fehler mit übermäßger Ödnis
08.02.2017 14:13:52 - Critical - General - ästhetisch überfällige Fleißarbeit
对于初学者,我尝试了以下简单的Logstash配置(它在Windows上运行,所以不要混淆斜线;):
input {
file {
path => "D:/logstash/bin/*.log"
sincedb_path => "C:\logstash\bin\file_clientlogs_lastrun"
ignore_older => 999999999999
start_position => "beginning"
stat_interval => 60
type => "clientlogs"
}
}
output {
if [type] == "clientlogs" {
elasticsearch {
index => "logstash-clientlogs"
}
}
}
这很好用 - 我看到输入逐行读入我指定的索引 - 当我用Kibana检查时,例如那两行可能看起来像这样(我只是省略了主机名 - 点击进入放大):
但当然这仍然是非常平坦的数据,我真的想从我的行和其他字段中提取正确的时间戳,并将@timestamp
和message
替换为那些;所以我在input
和output
之间插入了一些涉及grok - ,mutate - 和date-filter的过滤逻辑,因此生成的配置如下所示:
input {
file {
path => "D:/logs/*.log"
sincedb_path => "C:\logstash\bin\file_clientlogs_lastrun"
ignore_older => 999999999999
start_position => "beginning"
stat_interval => 60
type => "clientlogs"
}
}
filter {
if [type] == "clientlogs" {
grok {
match => [ "message", "%{MONTHDAY:monthday}.%{MONTHNUM2:monthnum}.%{YEAR:year} %{TIME:time} - %{WORD:severity} - %{WORD:granularity} - %{GREEDYDATA:logmessage}" ]
}
mutate {
add_field => {
"timestamp" => "%{year}-%{monthnum}-%{monthday} %{time}"
}
replace => [ "message", "%{logmessage}" ]
remove_field => ["year", "monthnum", "monthday", "time", "logmessage"]
}
date {
locale => "en"
match => ["timestamp", "YYYY-MM-dd HH:mm:ss"]
timezone => "Europe/Vienna"
target => "@timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
}
output {
if [type] == "clientlogs" {
elasticsearch {
index => "logstash-clientlogs"
}
}
}
现在,当我用Kibana查看这些日志时,我看到我想要添加的字段确实出现,时间戳和消息被正确替换,但我的变音符号全部消失了(点击放大)< / em>的:
我也尝试过设置
codec => plain {
charset => "UTF-8"
}
代表input
和output
,但这也没有改变任何事情。
当我将输出更改为stdout { }
时
输出似乎没问题:
2017-02-07T13:16:42.000Z MYPC Ähnlicher Fehler mit übermäßger Ödnis
2017-02-08T13:13:52.000Z MYPC ästhetisch überfällige Fleißarbeit
我还使用此PowerShell
命令查询索引:
Invoke-WebRequest –Method POST -Uri 'http://localhost:9200/logstash-clientlogs/_search' -Body '
{
"query":
{
"regexp": {
"message" : ".*"
}
}
}
' | select -ExpandProperty Content
但它也会返回Kibana所揭示的同样混乱的内容:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"logstash-clientlogs","_type":"clientlogs","_id":"AVskdTS8URonc
bfBgFwC","_score":1.0,"_source":{"severity":"Critical","debug":"timestampMatched","message":"�hnlicher Fehler mit �berm��ger �dnis\r","type":"clientlogs","path":"D:/logs/Client.log","@timestamp":"2017-02-07T13:16:42.000Z","granularity":"General","@version":"1","host":"MYPC","timestamp":"2017-02-07 14:16:42"}},{"_index":"logstash-clientlogs","_type":"clientlogs","_id":"AVskdTS8UR
oncbfBgFwD","_score":1.0,"_source":{"severity":"Critical","debug":"timestampMatched","message":"�sthetisch �berf�llige Flei�arbeit\r","type":"clientlogs","path":"D:/logs/Client.log","@timestamp":"2017-02-08T13:13:52.000Z","granularity":"General","@version":"1","host":"MYPC","timestamp":"2017-02-08 14:13:52"}}]}}
有没有其他人经历过这个并且有针对此用例的解决方案?我没有看到grok的任何设置指定任何编码(我传递的文件是UTF-8 with BOM
)并且输入本身的编码似乎没有必要,因为它在我得到正确的消息时我遗漏了过滤器。