这是我的数据格式:
[Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml
这是我的代码我试图按日期显示行数:
# datecount.py
import sys, collections
# sys.argv is the list of command-line arguments
# sys.arg[0] is the name of the program itself
# sys.arg[1] is optional and will be the file name
# set input based on number of arguments
if len(sys.argv) == 1:
f = sys.stdin
elif len(sys.argv) == 2:
try:
f = open(sys.argv[1])
except IOError:
print "Cannot open", sys.argv[1]
sys.exit()
else:
print "USAGE: python datecount [FILE]"
sys.exit()
dateCounts = collections.Counter()
# for every line passed into the script
for line in f:
# find indices of date section
start = line.find("[")
if start >= 0 :
end = line.find("]", start)
# graph just the date
date = line[start+21: end] #by YEAR
dateCounts[date]=dateCounts[date]+1
#print top dates
for date in dateCounts.most_common():
sys.stdout.write(str(date) + "\n")`
现在输出是:
('2017', 738057)
('2016', 446204)
('2015', 9995)
('2014', 706)
但我只想按日期计算,例如:
('May 02 2016', 128)
('May 03 2016', 105)
('May 04 2016', 99)
正在考虑实现正则表达但不知道如何。
如何摆脱日期中间的时间戳?
答案 0 :(得分:0)
我们可以使用以下代码获得预期结果。我希望这会有所帮助。
# datecount.py
import sys, collections
# sys.argv is the list of command-line arguments
# sys.arg[0] is the name of the program itself
# sys.arg[1] is optional and will be the file name
# set input based on number of arguments
if len(sys.argv) == 1:
f = sys.stdin
elif len(sys.argv) == 2:
try:
f = open(sys.argv[1])
except IOError:
print "Cannot open", sys.argv[1]
sys.exit()
else:
print "USAGE: python datecount [FILE]"
sys.exit()
dateCounts = collections.Counter()
# for every line passed into the script
for line in f:
# find indices of date section
start = line.find("[")
if start >= 0 :
end = line.find("]", start)
# graph just the date
date = line[start+5:11] +' '+ line[start+21:end] #by Date and YEAR
dateCounts[date]=dateCounts[date]+1
#print top dates
for date in dateCounts.most_common():
sys.stdout.write(str(date) + "\n")`
答案 1 :(得分:0)
使用regexp实现:
import sys
import collections
import re
dateCounts = collections.Counter()
input_str = """
[Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml
[Mon May 03 15:38:50 2017] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml
[Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml
"""
found = re.findall("\[(.*)\].*\[.*\].*\[.*\].*", input_str, re.MULTILINE)
for date in found:
dateCounts[date] = dateCounts[date] + 1
for date in dateCounts.most_common():
sys.stdout.write(str(date) + "\n")
输出:
('Mon May 02 15:38:50 2016', 2)
('Mon May 03 15:38:50 2017', 1)