我有以下格式的字典 {str:[datetime_object]}
示例:
test_data={'127.0.0.1':[datetime.datetime(2016, 5, 31, 2, 3, 48), datetime.datetime(2016, 5, 31, 3, 0, 53)],
'127.0.0.2': [datetime.datetime(2016, 5, 30, 0, 15, 10), datetime.datetime(2016, 5, 31, 2, 18, 29), datetime.datetime(2016, 5, 31, 2, 18, 41), datetime.datetime(2016, 5, 31, 2, 18, 49), datetime.datetime(2016, 5, 31, 2, 21, 32), datetime.datetime(2016, 5, 31, 2, 21, 40), datetime.datetime(2016, 5, 31, 2, 21, 46), datetime.datetime(2016, 5, 31, 2, 22), datetime.datetime(2016, 5, 31, 23, 0, 0)],
'127.0.0.3': [datetime.datetime(2016, 5, 31, 2, 19, 34), datetime.datetime(2016, 5, 31, 2, 19, 39)],
'127.0.0.4': [datetime.datetime(2016, 5, 31, 2, 20, 36), datetime.datetime(2016, 5, 31, 2, 20, 41)],
'127.0.0.5': [datetime.datetime(2016, 5, 31, 2, 21, 5)],
'127.0.0.6': [datetime.datetime(2016, 5, 31, 2, 21, 6)],
'127.0.0.7': [datetime.datetime(2016, 5, 31, 2, 21, 5)],
'127.0.0.8': [datetime.datetime(2016, 5, 31, 2, 21, 34), datetime.datetime(2016, 5, 31, 2, 21, 38)],
'127.0.0.9': [datetime.datetime(2016, 5, 31, 2, 22, 3), datetime.datetime(2016, 5, 31, 2, 23, 5)],
'127.0.0.10': [datetime.datetime(2016, 5, 31, 2, 10, 22), datetime.datetime(2016, 5, 31, 2, 12, 27)],
'127.0.0.11': [datetime.datetime(2016, 5, 31, 3, 11, 46), datetime.datetime(2016, 5, 31, 3, 13, 54)],
'127.0.0.12': [datetime.datetime(2016, 5, 31, 3, 13, 9), datetime.datetime(2016, 5, 31, 3, 13, 17)]}
这些条目是从每个IP收到的请求日期时间
我需要计算每个IP每小时的平均请求次数
我当前的尝试在此代码处结束
def count_accesses():
for key, value in ip_request_datetime_dict.items():
for recived in value:
yield recived.hour
for x in count_accesses():
print(x)
以上基于此解决方案的代码 How to count accesses per hour from log file entries?
正确的解决方案输出可能是包含比率的字典。例如:
此127.0.0.1的平均请求速率为每小时2个请求,因为您仍然可以看到02:03:48-> 03:00:53
此127.0.0.2的平均请求速率为每小时3个请求
ip_hit_rate = {'127.0.0.1':2, '127.0.0.2':3, '127.0.0.3':2 '127.0.0.4':2 '127.0.0.5':1, '127.0.0.6':1}
非常感谢您的帮助
答案 0 :(得分:1)
使用itertools.groupby
:
import itertools
res = {}
for k,v in test_data.items():
counts = [len(list(g)) for _, g in itertools.groupby(sorted(v), lambda x:(x.year, x.month, x.day, x.hour))]
res[k] = round(sum(counts)/len(counts))
输出:
{'127.0.0.1': 1,
'127.0.0.10': 2,
'127.0.0.11': 2,
'127.0.0.12': 2,
'127.0.0.2': 3,
'127.0.0.3': 2,
'127.0.0.4': 2,
'127.0.0.5': 1,
'127.0.0.6': 1,
'127.0.0.7': 1,
'127.0.0.8': 2,
'127.0.0.9': 2}
答案 1 :(得分:0)
这是我每小时计算请求的方式:
import datetime
test_data={'127.0.0.1':[datetime.datetime(2016, 5, 28, 2, 3, 48), datetime.datetime(2016, 5, 31, 2, 3, 53)],
'127.0.0.2': [datetime.datetime(2016, 5, 30, 0, 15, 10), datetime.datetime(2016, 5, 31, 2, 18, 29), datetime.datetime(2016, 5, 31, 2, 18, 41), datetime.datetime(2016, 5, 31, 2, 18, 49), datetime.datetime(2016, 5, 31, 2, 21, 32), datetime.datetime(2016, 5, 31, 2, 21, 40), datetime.datetime(2016, 5, 31, 2, 21, 46), datetime.datetime(2016, 5, 31, 2, 22), datetime.datetime(2016, 5, 31, 23, 0, 0)],
'127.0.0.3': [datetime.datetime(2016, 5, 31, 2, 19, 34), datetime.datetime(2016, 5, 31, 2, 19, 39)],
'127.0.0.4': [datetime.datetime(2016, 5, 31, 2, 20, 36), datetime.datetime(2016, 5, 31, 2, 20, 41)],
'127.0.0.5': [datetime.datetime(2016, 5, 31, 2, 21, 5)],
'127.0.0.6': [datetime.datetime(2016, 5, 31, 2, 21, 6)],
'127.0.0.7': [datetime.datetime(2016, 5, 31, 2, 21, 5)],
'127.0.0.8': [datetime.datetime(2016, 5, 31, 2, 21, 34), datetime.datetime(2016, 5, 31, 2, 21, 38)],
'127.0.0.9': [datetime.datetime(2016, 5, 31, 2, 22, 3), datetime.datetime(2016, 5, 31, 2, 23, 5)],
'127.0.0.10': [datetime.datetime(2016, 5, 31, 2, 10, 22), datetime.datetime(2016, 5, 31, 2, 12, 27)],
'127.0.0.11': [datetime.datetime(2016, 5, 31, 3, 11, 46), datetime.datetime(2016, 5, 31, 3, 13, 54)],
'127.0.0.12': [datetime.datetime(2016, 5, 31, 3, 13, 9), datetime.datetime(2016, 5, 31, 3, 13, 17)]}
def delta_to_hours(delta):
return delta.days * 24 + delta.seconds / 3600
def calc_rate(values):
num = len(values)
if num <= 1:
return 1
diff = delta_to_hours(values[-1] - values[0])
return num / diff
rates = {key:calc_rate(value) for key,value in test_data.items()}
print(rates)
输出为:
{
'127.0.0.1': 0.02777724195135125,
'127.0.0.2': 0.1925248083665102,
'127.0.0.3': 1440.0,
'127.0.0.4': 1440.0,
'127.0.0.5': 1,
'127.0.0.6': 1,
'127.0.0.7': 1,
'127.0.0.8': 1800.0,
'127.0.0.9': 116.12903225806451,
'127.0.0.10': 57.599999999999994,
'127.0.0.11': 56.25,
'127.0.0.12': 900.0
}
答案 2 :(得分:0)
itertools.groupby
很酷,但有时似乎很难理解。
collections.Counter
可能也对您有所帮助,并且更易于使用:
import datetime
from collections import Counter
test_data={'127.0.0.1':[datetime.datetime(2016, 5, 28, 2, 3, 48), datetime.datetime(2016, 5, 31, 2, 3, 53)],
'127.0.0.2': [datetime.datetime(2016, 5, 30, 0, 15, 10), datetime.datetime(2016, 5, 31, 2, 18, 29), datetime.datetime(2016, 5, 31, 2, 18, 41), datetime.datetime(2016, 5, 31, 2, 18, 49), datetime.datetime(2016, 5, 31, 2, 21, 32), datetime.datetime(2016, 5, 31, 2, 21, 40), datetime.datetime(2016, 5, 31, 2, 21, 46), datetime.datetime(2016, 5, 31, 2, 22), datetime.datetime(2016, 5, 31, 23, 0, 0)],
'127.0.0.3': [datetime.datetime(2016, 5, 31, 2, 19, 34), datetime.datetime(2016, 5, 31, 2, 19, 39)],
'127.0.0.4': [datetime.datetime(2016, 5, 31, 2, 20, 36), datetime.datetime(2016, 5, 31, 2, 20, 41)],
'127.0.0.5': [datetime.datetime(2016, 5, 31, 2, 21, 5)],
'127.0.0.6': [datetime.datetime(2016, 5, 31, 2, 21, 6)],
'127.0.0.7': [datetime.datetime(2016, 5, 31, 2, 21, 5)],
'127.0.0.8': [datetime.datetime(2016, 5, 31, 2, 21, 34), datetime.datetime(2016, 5, 31, 2, 21, 38)],
'127.0.0.9': [datetime.datetime(2016, 5, 31, 2, 22, 3), datetime.datetime(2016, 5, 31, 2, 23, 5)],
'127.0.0.10': [datetime.datetime(2016, 5, 31, 2, 10, 22), datetime.datetime(2016, 5, 31, 2, 12, 27)],
'127.0.0.11': [datetime.datetime(2016, 5, 31, 3, 11, 46), datetime.datetime(2016, 5, 31, 3, 13, 54)],
'127.0.0.12': [datetime.datetime(2016, 5, 31, 3, 13, 9), datetime.datetime(2016, 5, 31, 3, 13, 17)]}
def extract_hour(d: datetime.datetime):
return d.date(), d.hour
result = {}
for k, v in test_data.items():
cnt = Counter(map(extract_hour, v))
result[k] = sum(cnt.values()) / len(cnt)
print(result)
将输出
{
'127.0.0.1': 1.0,
'127.0.0.2': 3.0,
'127.0.0.3': 2.0,
'127.0.0.4': 2.0,
'127.0.0.5': 1.0,
'127.0.0.6': 1.0,
'127.0.0.7': 1.0,
'127.0.0.8': 2.0,
'127.0.0.9': 2.0,
'127.0.0.10': 2.0,
'127.0.0.11': 2.0,
'127.0.0.12': 2.0
}
答案 3 :(得分:0)
这是我一小时的解决方案:
result={}
for k in test_data.keys():
ref = test_data[k][0]
counter= []
c = 1
for h in range(1, len(test_data[k])):
if (test_data[k][h] - ref).total_seconds() / 3600 < 1.0:
c = c + 1
else:
counter.append(c)
c = 1
ref = test_data[k][h]
if h == len(test_data[k])-1:
counter.append(c)
result[k] = float(c) if len(counter) == 0 else float(sum(counter)) / len(counter)
print(result)
输出:
{'127.0.0.1': 2.0,
'127.0.0.2': 3.0,
'127.0.0.3': 2.0,
'127.0.0.4': 2.0,
'127.0.0.5': 1.0,
'127.0.0.6': 1.0,
'127.0.0.7': 1.0,
'127.0.0.8': 2.0,
'127.0.0.9': 2.0,
'127.0.0.10': 2.0,
'127.0.0.11': 2.0,
'127.0.0.12': 2.0}