我在熊猫中有一个数据框,如下所示。索引是日期时间对象,按天排序,分为5分钟的时段。我有一列名为“ col1”。因此,如果我这样做
df['col1']
我得到:
DateTime
2008-04-28 09:40:00 300.0
2008-04-28 09:45:00 -800.0
2008-04-28 09:50:00 0.0
2008-04-28 09:55:00 -100.0
2008-04-28 10:00:00 0.0
2008-04-29 09:40:00 500.0
2008-04-29 09:45:00 800.0
2008-04-29 09:50:00 100.0
2008-04-29 09:55:00 -100.0
2008-04-29 10:00:00 0.0
在熊猫中还有另一个数据框,它是使用groupby在原始数据框中使用
获得的df2 = df([df.index.time])[['col2']].mean()
输出:
col2
09:40:00 4603.585657
09:45:00 5547.011952
09:50:00 8532.007952
09:55:00 6175.298805
10:00:00 4236.055777
我想做的是在不使用for循环的情况下,将5分钟的bin中的col1除以col2。为了更好地解释,整天中,对于每个bin,将col1除以col2。例如,将col1中的所有9:40:00值除以col2中的9:40:00值。
我不知道如何在没有for循环的情况下开始执行此操作,但我的印象是它应该适用于熊猫。
预期输出为:
DateTime
2008-04-28 09:40:00 300.0/4603.585657
2008-04-28 09:45:00 -800.0/5547.011952
2008-04-28 09:50:00 0.0/8532.007952
2008-04-28 09:55:00 -100.0/6175.298805
2008-04-28 10:00:00 0.0/4236.055777
2008-04-29 09:40:00 500.0/4603.585657
2008-04-29 09:45:00 800.0/5547.011952
2008-04-29 09:50:00 100.0/8532.007952
2008-04-29 09:55:00 -100.0/6175.298805
2008-04-29 10:00:00 0.0/4236.055777
答案 0 :(得分:1)
如果需要除以时间:
df['new'] = df['col1'].div(df.groupby(df.index.time)['col1'].transform('mean'))
print (df)
col1 new
DateTime
2008-04-28 09:40:00 300.0 0.75
2008-04-28 09:45:00 -800.0 -inf
2008-04-28 09:50:00 0.0 0.00
2008-04-28 09:55:00 -100.0 1.00
2008-04-28 10:00:00 0.0 NaN
2008-04-29 09:40:00 500.0 1.25
2008-04-29 09:45:00 800.0 inf
2008-04-29 09:50:00 100.0 2.00
2008-04-29 09:55:00 -100.0 1.00
2008-04-29 10:00:00 0.0 NaN
或者如果需要除以天:
df['new'] = df['col1'].div(df.groupby(df.index.date)['col1'].transform('mean'))
print (df)
col1 new
DateTime
2008-04-28 09:40:00 300.0 -2.500000
2008-04-28 09:45:00 -800.0 6.666667
2008-04-28 09:50:00 0.0 -0.000000
2008-04-28 09:55:00 -100.0 0.833333
2008-04-28 10:00:00 0.0 -0.000000
2008-04-29 09:40:00 500.0 1.923077
2008-04-29 09:45:00 800.0 3.076923
2008-04-29 09:50:00 100.0 0.384615
2008-04-29 09:55:00 -100.0 -0.384615
2008-04-29 10:00:00 0.0 0.000000