我有一只熊猫,其格式如下:
title | decision | Time submitted
Book1 | 1 | 1486507594
Book1 | 2 | 1485450353
我想要做的是找出决定= 1的书籍的平均提交时间,然后是决定= 2的书籍的平均提交时间。我尝试过使用:
df_avg.loc[df_avg['decision'] == 2, 'submitted'].sum()
df_avg.loc[df_avg['decision'] == 1, 'submitted'].sum()
但它有时不起作用。我甚至尝试在使用datetime将时间转换为日期和时间之前和之后执行上述操作。关于如何做到这一点的任何想法将不胜感激。
答案 0 :(得分:4)
我认为您可以先将日期时间转换为ns
unix格式,然后将groupby
转换为合并mean
:
print (df_avg)
title decision Time submitted
0 Book1 1 1486507594
1 Book1 1 1486500012
2 Book1 2 1485480353
3 Book1 2 1485450353
df_avg['Time submitted'] = pd.to_datetime(df_avg['Time submitted'], unit='s')
.values.astype(np.int64)
df = df_avg.groupby('decision', as_index=False)['Time submitted'].mean()
df['Time submitted'] = pd.to_datetime(df['Time submitted'], unit='ns')
print (df)
decision Time submitted
0 1 2017-02-07 21:43:23
1 2 2017-01-26 21:15:53
但是对于您来说,数据还会将多个second
的unix数据运行到10**9
:
df = (df_avg['Time submitted'] * 10**9).groupby(df_avg['decision']).mean().reset_index()
df['Time submitted'] = pd.to_datetime(df['Time submitted'], unit='ns')
print (df)
decision Time submitted
0 1 2017-02-07 21:43:23
1 2 2017-01-26 21:15:53