我是piwik的新手并尝试导入一堆日志。我需要log-format-regex的帮助。日志中的示例行是:
“1.1.1.1”2.2.2.2 - myuser [09 / Dec / 2012:04:03:29 -0500]“GET /signon.html HTTP / 1.1”304“http://www.example.com/示例“”Mozilla / 5.0(Windows NT 6.1; WOW64; rv:9.0.1)Gecko / 20100101 Firefox / 9.0.1“
我的日志格式正则表达式如下所示:
--log-format-regex='\\\\"(?P<ip>\\\\S+)\\\\" \\\\S+ \\\\S+ \\\\S+ \\\\[(?P<date>.*?) (?P<timezone>.*?)\\\\] \\\\"\\\\S+ (?P<path>.*?) \\\\S+\\\\" (?P<status>\\\\S+) (?P<length>\\\\S+) \\\\"(?P<referrer>.*?)\\\\" \\\\"(?P<user_agent>.*?)\\\\"'
我一直得到所有“请求被忽略”和“无效的日志行”。例如:
0 requests imported successfully
0 requests were downloads
236252 requests ignored:
236252 invalid log lines
0 requests done by bots, search engines, ...
0 HTTP errors
0 HTTP redirects
0 requests to static resources (css, js, ...)
0 requests did not match any known site
0 requests did not match any requested hostname
如何修复log-format-regex?
TIA, 担
答案 0 :(得分:3)
导入时导入piwik
(resp。matomo
)日志,您可以发出两次--debug
选项,这将显示无效行。
以下是显示它的脚本示例(但这是我首选的日志格式)
python /opt/piwik.git/misc/log-analytics/import_logs.py \
--debug --debug \
--url=$piwik_site \
--log-format-regex='(?P<host>\S+) (?P<ip>\S+) \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "\S+ (?P<path>.*?) \S+" (?P<status>\d+) (?P<length>\d+) "(?P<referrer>.*?)"$'
--add-sites-new-hosts \
--enable-http-errors \
--enable-http-redirects \
--enable-static \
--strip-query-string \
--show-progress \
--show-progress-delay 2 \
--recorders $cpu \
"$1"
$ 1是我导入的文件的名称(我的Apache,Nginx和Lighttpd boxen都使用相同的格式)。
输出将有几行如下:
2013-09-03 19:42:34,145: [DEBUG] Invalid line detected (line did not match): edoceo.com 10.0.0.1 - [03/Sep/2013:16:41:03 -0700] "GET / HTTP/1.1" 301 - "-" "Some Bad Robot v0.1"
那些向您展示了什么是无效的,并提供了如何调整/调整正则表达式的线索。
了解了我的设置细节