使用awk getline bash在指定的时间范围内从日志文件中提取数据

时间:2013-09-06 18:15:58

标签: bash parsing awk logfiles

我正在搜索解析日志文件,并在此链接中找到了我需要的内容 extract data from log file in specified range of time

但最有用的答案(由@Kent发布):

# this variable you could customize, important is convert to seconds. 
# e.g 5days=$((5*24*3600))
x=$((5*60))   #here we take 5 mins as example

# this line get the timestamp in seconds of last line of your logfile
last=$(tail -n1 logFile|awk -F'[][]' '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; print d;}' )

#this awk will give you lines you needs:
awk -F'[][]' -v last=$last -v x=$x '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; if (last-d<=x)print $0 }' logFile 

我认为错误发生在"date +%s -d ....部分

发出以下错误:

sh: -c: line 0: unexpected EOF while looking for matching `"'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching `"'
sh: -c: line 1: syntax error: unexpected end of file

我在此之前花了很多时间试图解决,但没有找到任何解决方案。

crontab将调用该脚本以获取最后1分钟的日志行,并计算一分钟内列出ip的次数,以便我可以检测它是否是攻击。这是另一个任务,希望专家能够帮助在同一个问题中提供所需的代码。(我认为可以用2行解决)。

2 个答案:

答案 0 :(得分:2)

问题可能只是你没有引用你的shell变量。看:

$ foo='ab cd'

$ awk -v bar="$foo" 'BEGIN{print bar}'
ab cd

$ awk -v bar=$foo 'BEGIN{print bar}'
awk: fatal: cannot open file `BEGIN{print bar}' for reading (No such file or directory)

是的,我知道这是一个不同的错误信息 - 当你保留shell变量不加引号时会发生什么事情可以通过任何数量的事情取决于变量的值,目录的内容等等,其中一些非常糟糕,如删除文件系统中的每个文件。

所以,引用你的变量:

-v last="$last" -v x="$x"

然后看看你是否还有问题。

顺便说一下,这里是如何使用带有输入文件http://pastebin.com/BXmS4zLn的GNU awk真正解决原始问题:

$ cat tst.awk
BEGIN {
    ARGV[ARGC++] = ARGV[ARGC-1]

    mths = "JanFebMarAprMayJunJulAugSepOctNovDec"

    if (days)  { hours = days * 24  }
    if (hours) { mins  = hours * 60 }
    if (mins)  { secs  = mins * 60  }
    deltaSecs = secs
}

NR==FNR {
    nr2secs[NR] = mktime($6" "(match(mths,$5)+2)/3" "$4" "gensub(/:/," ","g",$7))
    next
}

nr2secs[FNR] >= (nr2secs[NR-FNR] - deltaSecs)

$ awk -v hours=1 -f tst.awk file
157.55.34.99 - -  06 Sep 2013 09:13:10 +0300  "GET /index.php HTTP/1.1" 200 16977 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
85.163.134.149 - -  06 Sep 2013 09:50:23 +0300  "GET /wap/wapicons/mnrwap.jpg HTTP/1.1" 200 1217 "http://mydomain.com/main.php" "Mozilla/5.0 (Linux; U; Android 4.1.2; en-gb; GT-I9082 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"
83.113.48.218 - -  06 Sep 2013 10:13:07 +0300  "GET /english/nicons/word.gif HTTP/1.1" 200 803 "http://mydomain.com/french/details.php?eid=127928&cid=18&fromval=1&frid=18" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"

$ gawk -v mins=60 -f tst.awk file
157.55.34.99 - -  06 Sep 2013 09:13:10 +0300  "GET /index.php HTTP/1.1" 200 16977 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
85.163.134.149 - -  06 Sep 2013 09:50:23 +0300  "GET /wap/wapicons/mnrwap.jpg HTTP/1.1" 200 1217 "http://mydomain.com/main.php" "Mozilla/5.0 (Linux; U; Android 4.1.2; en-gb; GT-I9082 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"
83.113.48.218 - -  06 Sep 2013 10:13:07 +0300  "GET /english/nicons/word.gif HTTP/1.1" 200 803 "http://mydomain.com/french/details.php?eid=127928&cid=18&fromval=1&frid=18" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"

$ gawk -v mins=20 -f tst.awk file
83.113.48.218 - -  06 Sep 2013 10:13:07 +0300  "GET /english/nicons/word.gif HTTP/1.1" 200 803 "http://mydomain.com/french/details.php?eid=127928&cid=18&fromval=1&frid=18" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"

你可以指定days =或hours =或mins =或secs = variables,它会做正确的事。

如果您只需要一个脚本来获取最后1分钟的日志行(正如您的问题所述)(现在?),并希望看到一行代码来执行此操作:

$ gawk 'NR==FNR {nr2secs[++nr] = mktime($6" "(match("JanFebMarAprMayJunJulAugSepOctNovDec",$5)+2)/3" "$4" "gensub(/:/," ","g",$7)); next} nr2secs[FNR] >= (nr2secs[nr] - 60)' file file
83.113.48.218 - -  06 Sep 2013 10:13:07 +0300  "GET /english/nicons/word.gif HTTP/1.1" 200 803 "http://mydomain.com/french/details.php?eid=127928&cid=18&fromval=1&frid=18" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"

答案 1 :(得分:1)

根据您的输入here,您可以使用以下脚本:

#!/bin/bash

LOGFILE=/path/to/logfile

X=$(( 60 * 60 )) ## 1 Hour

function get_ts {
    DATE="${1%%\]*}"; DATE="${DATE##*\[}"; DATE=${DATE/:/ }; DATE=${DATE//\// }
    TS=$(date -d "$DATE" '+%s')
}

get_ts "$(tail -n 1 "$LOGFILE")"
LAST=$TS

while read -r LINE; do
    get_ts "$LINE"
    (( (LAST - TS) <= X )) && echo "$LINE"
done < "$LOGFILE"

将其保存到文件并更改LOGFILE的值,然后使用bash script.sh运行。

示例输出:

157.55.34.99 - - [06/Sep/2013:09:13:10 +0300] "GET /index.php HTTP/1.1" 200 16977 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
85.163.134.149 - - [06/Sep/2013:09:50:23 +0300] "GET /wap/wapicons/mnrwap.jpg HTTP/1.1" 200 1217 "http://mydomain.com/main.php" "Mozilla/5.0 (Linux; U; Android 4.1.2; en-gb; GT-I9082 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"