Question

我有一个如下数据框：

df.head(4)
    timestamp                  user_id   category
0  2017-09-23 15:00:00+00:00     A        Bar
1  2017-09-14 18:00:00+00:00     B        Restaurant
2  2017-09-30 00:00:00+00:00     B        Museum
3  2017-09-11 17:00:00+00:00     C        Museum

我想为每个类别的访问者人数每小时计算一次，并具有如下数据框

df 
     year month day   hour   category   count
0    2017  9     11    0       Bar       2
1    2017  9     11    1       Bar       1
2    2017  9     11    2       Bar       0
3    2017  9     11    3       Bar       1

Answer 1

假设您想要urllib.request.urlretrieve(splitted_line[2].rstrip() + "/DASH_720.mp4", filename=("img_" + splitted_line[0] + ".mp4")) print ("Image saved for {0}".format(splitted_line[0]))日期和小时，如果timestamp列是datetime列，则可以使用以下代码

try:
    urllib.request.urlretrieve(splitted_line[2].rstrip() + "/DASH_720.mp4", filename=("img_" + splitted_line[0] + ".mp4")) 
    print ("Image saved for {0}".format(splitted_line[0]))
except:
    pass

Answer 2

要获取每个类别每小时的user_id计数，您可以在日期时间中使用groupby：

df.timestamp = pd.to_datetime(df['timestamp'])
df_new = df.groupby([df.timestamp.dt.year, 
                  df.timestamp.dt.month, 
                  df.timestamp.dt.day, 
                  df.timestamp.dt.hour, 
                  'category']).count()['user_id']
df_new.index.names = ['year', 'month', 'day', 'hour', 'category']
df_new = df_new.reset_index()

在数据框中有日期时间时，可以使用dt访问器，该访问器允许您访问日期时间的不同部分，即年份。

Python：如何对熊猫数据框进行分组以按小时和日期进行计数？

2 个答案: