Logstash多个日志格式

时间:2015-03-15 19:28:18

标签: apache logging logstash grok

所以,我们正在研究某种日志聚合器,因为整个地方的日志都没有缩放。我一直在看Logstash,并且能够在昨晚启动并运行kibana的实例,但是存在一些问题。例如,使用geoip使用httpd(我认为它们是apache)日志的域名。

无论如何,现在我想打开我们的其他Web服务器日志,我无法理解某些内容:我是否有必要为我们使用的所有不同格式的日志定义模式?这通常是如何接近的:一个大的logstash.conf文件,还是其他一些方式?

PS:我意识到其中一些日志有相似之处,例如,error_log文件的格式几乎完全相同,access_logs也是如此。所以我假设这样的事情会处理所有* error_log文件。

input { 
    file {
        path => "//var/log/httpd/*error_log"
        type => "error_log"
    }
}

filter {
    if [type] == "error_log" {
        grok {
            match => [ "message", "%{COMBINEDAPACHELOG}" ]
        }
    }
}

无论如何,这是我要导入的每个日志的示例行。

var/log/httpd/access_log:
207.46.13.87 support.mycompany.com - - [15/Mar/2015:07:49:12 -0400] "GET / HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

var/log/httpd/api-access_log:
192.168.1.5 api.mycompany.com - - [15/Mar/2015:06:50:01 -0400] "GET /diag/heartbeat HTTP/1.0" 502 495 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/api-error_log:
[Sun Mar 15 08:45:06 2015] [error] [client 192.168.1.5] proxy: Error reading from remote server returned by /diag/heartbeat

var/log/httpd/audit_log:
type=USER_END msg=audit(1426380301.674:2285509): user pid=30700 uid=0 auid=0 msg='PAM: session close acct="root" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'

var/log/httpd/default-access_log:
74.77.76.4 dc01.mycompany.com - - [15/Mar/2015:09:33:46 -0400] "GET /prod/shared/700943_image003.jpg HTTP/1.1" 200 751 "http://mail.twc.com/do/mail/message/view?msgId=INBOXDELIM18496" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"

var/log/httpd/error_log:
[Sun Mar 15 13:54:16 2015] [error] [client 107.72.162.115] File does not exist: /var/www/html/portal-prod/apple-touch-icon.png

var/log/httpd/portal-prod-access_log:
192.168.1.5 portal.mycompany.com - - [15/Mar/2015:04:15:02 -0400] "GET /index.php/account/process_upload_file?upload_file=T702135.0315.txt HTTP/1.0" 200 9 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/ssl_access_log:
97.77.91.2 - - [15/Mar/2015:10:00:07 -0400] "POST /prod/index.php/api/uploader HTTP/1.1" 200 10

var/log/httpd/ssl_error_log:
[Sun Mar 15 09:00:03 2015] [error] [client 99.187.226.241] client denied by server configuration: /var/www/html/api

var/log/httpd/ssl_request_log:
[15/Mar/2015:11:10:02 -0400] dc01.mycompany.com 216.240.171.98 TLSv1 RC4-MD5 "POST /prod/index.php/api/uploader HTTP/1.1" 7

var/log/httpd/support-access_log:
209.255.201.30 support.mycompany.com - - [15/Mar/2015:04:07:51 -0400] "GET /cron/index.php?/Parser/ParserMinute/POP3IMAP HTTP/1.0" 200 360 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/support-error_log:
[Sun Mar 15 04:05:43 2015] [warn] RSA server certificate CommonName (CN) `portal.mycompany.com' does NOT match server name!?

var/log/httpd/web-prod-access_log
62.210.141.227 www.mycompany.com - - [15/Mar/2015:04:38:30 -0400] "HEAD /lib/uploadify/uploadify.swf HTTP/1.1" 404 - "http://www.mycompany.com/lib/uploadify/uploadify.swf" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

var/log/httpd/web-prod-error_log:
[Sun Mar 15 04:38:30 2015] [error] [client 62.210.141.227] File does not exist: /var/www/html/web-prod/lib, referer: http://www.mycompany.com/lib/uploadify/uploadify.swf

var/log/cron:
Mar 15 04:30:01 lilo crond[22758]: (root) CMD (/opt/mycompnay/bin/check_replication.sh)

var/log/mysqld.log:
150314  5:07:34 [ERROR] Slave SQL: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'my_database'. Query: 'insert into some_table (column_names) values (values)', Error_code: 1213

var/log/openvpn.log:
Sun Mar 15 13:19:31 2015 Re-using SSL/TLS context
Sun Mar 15 12:23:40 2015 don/50.182.238.21:43315 Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 1024 bit RSA

var/log/maillog:
Mar 15 05:26:45 lilo postfix/qmgr[4428]: 70460B04004: removed
Mar 15 07:06:40 lilo postfix/smtpd[31732]: connect from boots[192.168.1.4]

codeigniter_logs:
DEBUG - 2015-03-15 14:48:29 --> Session class already loaded. Second attempt ignored.
DEBUG - 2015-03-15 14:48:29 --> Helper loaded: url_helper

2 个答案:

答案 0 :(得分:1)

每个具有不同格式的日志文件都需要不同的grok模式。使用[type]有条件地运行这些是明智的,因为它减少了处理。

如果您的日志共享相同的"前缀" (比如一个系统日志/时间/优先级),你可以先把这些东西放在一个grok中,然后再从遗留物中查找特定的东西。

随着配置文件的增长,请注意您可以将其拆分为磁盘上的多个文件。 Logstash将它们合并在一起(按字母顺序排列)。

答案 1 :(得分:0)

因此,当使用COMBINEDAPACHELOG模式解析该行时,令我讨厌的一个部分是geoip过滤器:

192.168.1.5 portal.mycompany.com - [15 / Mar / 2015:04:15:02 -0400]" GET /index.php/account/process_upload_file?upload_file=T702135.0315.txt HTTP /1.0" 200 9" - " " Wget / 1.11.4 Red Hat已修改"

它将获得portal.mycompany.com的ip并使用它来确定位置。使用模式"%{IP:clientip}%{COMBINEDAPACHELOG}"照顾好这个。

这是我的过滤器部分:

 if [type] == "apache" {
        if [path] =~ "access" and [path] !~ "ssl_access" {
                mutate { replace => { type => "apache_access" } }
                grok {  match => { "message" => "%{IP:clientip} %{COMBINEDAPACHELOG}" } }
                #grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }
                date {
                        locale => en
                        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
                }
        } else if [path] =~ "ssl_access" {
                mutate { replace => { type => "apache_access" } }
                grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }
                date {
                        locale => en
                        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
                }

        } else if [path] =~ "error" {
                mutate { replace => { type => "apache_error" } }
        }
}

if [agent] != "" {
        useragent { source => "agent" }
}

geoip { source => "clientip" }

在输入部分非常具体也有很大帮助。我仍然需要设置一个redis实例来将日志从我们的其他DC发送到这个盒子,但到目前为止,它的表现非常出色。

我希望有一个包含Kibana 4的预先打包的ELK堆栈。用户界面比Kibana 3更清洁。