包含以下数据的日志文件。
2014-10-19 17:30:25:
Creating destination directory: "\master1\users\jamesk\Java\chapter05\tech-support-complete\doc\"
Loading source file Error \\master1\users\jamesk\Java\chapter05\tech-support-complete\JamesKohout.java...
onstructing Javadoc information...Error
31 Error Standard Doclet version 1.6.0_26 Error
-encoding Error
19 windows-1252
20 -charset Error
21 windows-1252
22 -docletpath
2014-10-19 18:30:25:
Creating destination directory: "\master1\users\jamesk\Java\chapter05\tech-support-complete\doc\"
Loading source file Error \\master1\users\jamesk\Java\chapter05\tech-support-complete\JamesKohout.java...
onstructing Javadoc Error information...
31 Standard Doclet version 1.6.0_26 Error
-encoding Error
19 windows-1252
20 -charset Error
21 windows-1252
22 -docletpath
2014-10-19 19:30:25:
Creating destination directory: "\master1\users\jamesk\Java\chapter05\tech-support-complete\doc\"
Loading source file Error \\master1\users\jamesk\Java\chapter05\tech-support-complete\JamesKohout.java...
onstructing Javadoc information...Error
31 Standard Doclet version 1.6.0_26 Error
-encoding
19 windows-1252
20 -charset Error
21 windows-1252
22 -docletpath
2014-10-19 20:30:25:
Creating destination directory:Error "\master1\users\jamesk\Java\chapter05\tech-support-complete\doc\"
Loading source file Error \\master1\users\jamesk\Java\chapter05\tech-support-complete\JamesKohout.java...
onstructing Javadoc information...
31 Standard Doclet version 1.6.0_26 Error
-encoding Error
19 windows-1252
20 -charset Error
21 windows-1252 Error
22 Error -docletpath
我想在Unix / python中编写一个脚本,它会触及“Error”这个词,并在不同的时间从日志文件上面找到它的wordcount。 该文件包含不同时间间隔的数据。单词Error在第一个时间间隔有一个计数6,第二个时间间隔的计数为5,依此类推。 我希望输出为
2014-10-19 17:30:25: Error Count=6
2014-10-19 18:30:25: Error Count=5
2014-10-19 19:30:25: Error Count=4
2014-10-19 20:30:25: Error Count=7
我尝试使用以下命令但它只提供整个文件中存在的总字数。
grep -i "Error" | wc -l
请帮助。 感谢。
答案 0 :(得分:2)
import re
pattern=re.compile(r"\d{4}-\d{1,2}-\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}:|Error",re.IGNORECASE)
ll=pattern.findall(x)
d={}
for x in ll:
if x!="Error":
d[x]=0
last=x
else:
d[last]=d[last]+1
print d
这里x是你的数据或file.read()。
答案 1 :(得分:2)
使用Awk轻松完成工作。
awk '/^[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9] [012][0-9]:[0-5][0-9]:[0-6][0-9]:/ {
t=$0 }
/Error/ { ++e[t] }
END { for (s in e) print s "Error-Count=" e[s] }' logfile
答案 2 :(得分:1)
直接awk
:
awk '/^201[0-9].*:/{if (cont){print cont}cont=0;printf $0}/Error/{cont+=1}END{print cont}' infile
解释code
:
awk '/^201[0-9].*:/{ # Timestamp pattern reached
if (cont){
print cont # print previus timestamp
} # counter if exists and not zero
cont=0 # initialize actual timestamp counter
printf $0
} # print timestamp WITHOUT linebreak
/Error/{ # Error patter reached
cont+=1 # Aaccumulated count
}
END{
print cont # print remainder counter
}' infile
答案 3 :(得分:0)
这里你使用python:
>>> f = open('logfile').readlines()
>>> i = 0
>>> while True:
... if i+10 > len(f):
... break
... tmp = len(re.findall('Error',"".join(f[i+1:i+10])))
... print f[i].strip() + " Error-Count=" + str(tmp)
... i +=10
...
2014-10-19 17:30:25: Error-Count=6
2014-10-19 18:30:25: Error-Count=5
2014-10-19 19:30:25: Error-Count=4
2014-10-19 20:30:25: Error-Count=7