计算时间戳的行数

时间:2017-07-04 05:06:45

标签: python machine-learning timestamp time-series

我正在处理数据集

https://pastebin.com/PEFUspiU

我必须对其进行分组并计算特定时间段内有多少请求,然后很容易将图表 时间与请求数量相比较。 < /强>

示例

**timestamp - number of request**

21-06-2016 09:00:00 - 2

21-06-2016 10:00:00 - 1

21-06-2016 11:00:00 - 5

我该如何计算?

感谢

P.S我尝试使用data['timestamp'].value_counts()但出现错误:

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6

dateparse = lambda dates: pd.datetime.strptime(dates, '%d-%m-%Y %H:%M:%S')
data = pd.read_csv('/home/amfirnas/Desktop/localhost_access_log.2016-06-21.csv',
                   parse_dates=['timestamp'], index_col='timestamp',date_parser=dateparse)

print data.head(25)

# print data['time'].value_counts()

# print data.groupby(['time']).groups.keys()

ts = data['timestamp'].value_counts()

# plt.plot(ts)
# plt.show()

2 个答案:

答案 0 :(得分:0)

阅读文件:

 df = pd.read_csv('/home/local/sayali/Downloads/dataset-server_logs.csv')

[In]:df

              host            timestamp  status   byte
0  192.168.102.100  21-06-2016 09:54:44     200  17811
1  192.168.102.100  21-06-2016 09:54:44     200  21630
2  192.168.100.160  21-06-2016 10:08:08     404   1098
3  192.168.100.160  21-06-2016 11:20:44     200  17811
4  192.168.100.160  21-06-2016 11:20:44     200  21630
5  192.168.102.100  21-06-2016 11:54:44     200  17811
6  192.168.102.100  21-06-2016 11:54:44     200  21630
7  192.168.102.100  21-06-2016 11:54:44     200  21630

ts = pd.DataFrame(df['timestamp'].value_counts()))

ts
Out[15]: 
                     timestamp
2016-06-21 11:54:44          3
2016-06-21 09:54:44          2
2016-06-21 11:20:44          2
2016-06-21 10:08:08          1

#Convert index to datetime format using pd.to_datetime()
ts.index = pd.to_datetime(ts.index)

# PLOT
plt.title('Number of Requests based on timestamp') 
plt.xlabel('Timestamp')
plt.ylabel('Total number of Requests') 
#Change xticks orientation to vertical 
plt.xticks(rotation='vertical')        
plt.plot(ts)

enter image description here

答案 1 :(得分:0)

如果你想计算每小时的数量,而不是value_count(),你可以对它们进行分组然后计数,为此,确保你的时间戳是pandas datetime:

df['timestamp'] = pd.to_datetime(df['timestamp'])
df.groupby(pd.Grouper(key='timestamp', freq="1H")).count()