我想构建看起来像
的nginx日志ip - - [ 18 / Dec / 2016:06:44:41 +0300 ]" GET / some / part / thing HTTP / 1.1" 200 4320 " https:// 推荐人" " Mozilla / 5.0(Windows NT 10.0; WOW64)AppleWebKit / 537.36(KHTML,与Gecko一样)Chrome / 55.0.2883.87 Safari / 537.36"
目前,我正在阅读这些日志的每一行,并使用grep -E -o
选择我需要的详细信息(ip,datetime,part,http_code,bandwidth,referrer)。
但是我的日志数量非常慢。 是否可以使用lookbehind等对每个整个日志文件应用regexp,而不是每行?。我也想在GPU上制作它。
更新:
while IFS= read -r line || [ "$line" ]; do
cnt=$((cnt+1))
# we will match each information separating with first space,
# then remove that info from line for the next info
ip=,
datetime=,
slug=,
http_code=,
chunk_size=,
referrer=,
# Input 1. ip address
ip=`echo $line | grep -E -o '^\S*'` || parse_error "ip address" $1 $cnt
line=${line//$ip}
# remove trash - -
trash=`echo $line | grep -E -o '\-\s\-'` || parse_error "- - trash" $1 $cnt
line=${line//$trash}
# Input 2. datetime
datetime=`echo $line | grep -E -o '[0-9]{2}/[A-Za-z]+/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}\s\+[0-9]{4}'` || parse_error "datetime" $1 $cnt
line=${line//$datetime}
# Input 3. slug
slug=`echo $line | grep -o '[a-zA-Z0-9]*/mp4' | sed -e 's/\/mp4//'` || parse_error "stream slug" $1 $cnt
# remove trash
trash=`echo $line | grep -E -o '^.*HTTP/[0-2]{1}.[0-9]{1}"\s'` || parse_error "full HTTP GET req" $1 $cnt
line=${line//$trash}
# Input 4. http code
http_code=`echo $line | grep -o '^\S*'` || parse_error "http code" $1 $cnt
line=${line//$http_code}
# this can be checked only here with regex above :(
if [ $http_code != "200" ] && [ $http_code != "206" ]; then
# continue to next line, skip this one, because only HTTP 200, 206 req are acceptable.
continue
fi
# Input 5. http payload chunk size
chunk_size=`echo $line | grep -o '^\S*'` || parse_error "chunk size" $1 $cnt
line=${line//$chunk_size}
# Input 6. Referrer
referrer=`echo $line | grep -Po -m 1 '"\K[^"]*' | head -1` || parse_error "referrer" $1 $cnt
# Handle cases when regex in Input 6 fails to match http referer, and match some trash instead
if [[ $referrer != "http"* ]]; then
referrer=""
fi
wait
# printf "$ip,$datetime,$slug,$http_code,$chunk_size,%s\n" $referrer >> ./$3.csv || echo "[-] Can not write to .csv, something is bad at $cnt."
string+="$ip,$datetime,$slug,$http_code,$chunk_size,$referrer\n"|| echo "[-] Can not put to string, something is bad at $cnt."
done < "$1"