我希望这是一个简单的问题....我只想跟踪数据框中最后一次满足条件的情况。我的计划是首先添加一行,当满足条件时,该行将获取索引的值。然后,我计划使用fillna来填充额外的行,以便每行都有最后一次满足条件。但是,我似乎找不到任何方法可以根据条件将新列的值设置为索引的值,而不会获得不正确的数据或错误。下面是一个包含所需结果的示例,但我得到ValueError: array is not broadcastable to correct shape
?
rows = 50
df = pd.DataFrame(np.random.randn(rows,2), columns=list('AB'), index=pd.date_range('1/1/2000', periods=rows, freq='1H'))
df.loc[df.A > 0.5, 'LAST_TIME_A_ABOVE_X'] = df.index
# ValueError: array is not broadcastable to correct shape
df['LAST_TIME_A_ABOVE_X'] = df['LAST_TIME_A_ABOVE_X'].fillna(method='ffill')
期望的结果:
print df.tail()
A B LAST_TIME_A_ABOVE_X
2000-01-02 19:00:00 0.952454 0.046514 2000-01-02 19:00:00
2000-01-02 20:00:00 -0.216546 -0.254344 2000-01-02 19:00:00
2000-01-02 21:00:00 -0.237128 -0.830337 2000-01-02 19:00:00
2000-01-02 22:00:00 0.889550 0.060698 2000-01-02 22:00:00
2000-01-02 23:00:00 0.172436 -0.566921 2000-01-02 22:00:00
2000-01-03 00:00:00 1.092696 1.053605 2000-01-03 00:00:00
2000-01-03 01:00:00 1.284858 0.117552 2000-01-03 01:00:00
由于
答案 0 :(得分:0)
您还需要屏蔽rhs,因此将df.loc[df.A > 0.5, 'LAST_TIME_A_ABOVE_X'] = df.index
更改为df.loc[df.A > 0.5, 'LAST_TIME_A_ABOVE_X'] = df.loc[df.A > 0.5].index
:
In [175]:
rows = 50
df = pd.DataFrame(np.random.randn(rows,2), columns=list('AB'), index=pd.date_range('1/1/2000', periods=rows, freq='1H'))
df.loc[df.A > 0.5, 'LAST_TIME_A_ABOVE_X'] = df.loc[df.A > 0.5].index
# ValueError: array is not broadcastable to correct shape
df['LAST_TIME_A_ABOVE_X'] = df['LAST_TIME_A_ABOVE_X'].fillna(method='ffill')
df
Out[175]:
A \
2000-01-01 00:00:00 1970-01-01 00:00:00
2000-01-01 01:00:00 1970-01-01 00:00:00.000000001
2000-01-01 02:00:00 1970-01-01 00:00:00.000000001
2000-01-01 03:00:00 1969-12-31 23:59:59.999999999
2000-01-01 04:00:00 1970-01-01 00:00:00
2000-01-01 05:00:00 1970-01-01 00:00:00.000000001
2000-01-01 06:00:00 1970-01-01 00:00:00
2000-01-01 07:00:00 1970-01-01 00:00:00
2000-01-01 08:00:00 1970-01-01 00:00:00
2000-01-01 09:00:00 1969-12-31 23:59:59.999999999
2000-01-01 10:00:00 1970-01-01 00:00:00
2000-01-01 11:00:00 1970-01-01 00:00:00
2000-01-01 12:00:00 1970-01-01 00:00:00
2000-01-01 13:00:00 1969-12-31 23:59:59.999999999
2000-01-01 14:00:00 1969-12-31 23:59:59.999999999
2000-01-01 15:00:00 1970-01-01 00:00:00
B LAST_TIME_A_ABOVE_X
2000-01-01 00:00:00 1970-01-01 00:00:00 NaT
2000-01-01 01:00:00 1970-01-01 00:00:00 2000-01-01 01:00:00
2000-01-01 02:00:00 1970-01-01 00:00:00 2000-01-01 02:00:00
2000-01-01 03:00:00 1970-01-01 00:00:00 2000-01-01 02:00:00
2000-01-01 04:00:00 1970-01-01 00:00:00 2000-01-01 02:00:00
2000-01-01 05:00:00 1970-01-01 00:00:00 2000-01-01 05:00:00
2000-01-01 06:00:00 1970-01-01 00:00:00 2000-01-01 05:00:00
2000-01-01 07:00:00 1969-12-31 23:59:59.999999999 2000-01-01 05:00:00
2000-01-01 08:00:00 1970-01-01 00:00:00 2000-01-01 05:00:00
2000-01-01 09:00:00 1970-01-01 00:00:00 2000-01-01 05:00:00
2000-01-01 10:00:00 1970-01-01 00:00:00 2000-01-01 05:00:00
2000-01-01 11:00:00 1970-01-01 00:00:00 2000-01-01 05:00:00
2000-01-01 12:00:00 1970-01-01 00:00:00.000000001 2000-01-01 05:00:00
上面产生了一个错误,这与它是一个datetimeindex这个事实有关,如果你重置索引并执行相同的掩码然后设置索引,那么分配值就会在整个df上广播,这是错误的回来你得到了所需的输出:
In [192]:
rows = 50
df = pd.DataFrame(np.random.randn(rows,2), columns=list('AB'), index=pd.date_range('1/1/2000', periods=rows, freq='1H'))
df.reset_index(inplace=True)
temp = df.loc[df.A > 0.5,'index']
df.loc[df.A > 0.5, 'LAST_TIME_A_ABOVE_X'] = temp
df['LAST_TIME_A_ABOVE_X'] = df['LAST_TIME_A_ABOVE_X'].fillna(method='ffill')
df.set_index('index', inplace=True)
df
Out[192]:
A B LAST_TIME_A_ABOVE_X
index
2000-01-01 00:00:00 -1.015624 1.156609 NaT
2000-01-01 01:00:00 -1.223371 -1.378067 NaT
2000-01-01 02:00:00 1.012627 -0.324465 2000-01-01 02:00:00
2000-01-01 03:00:00 1.298507 -1.216586 2000-01-01 03:00:00
2000-01-01 04:00:00 0.985638 0.058768 2000-01-01 04:00:00
2000-01-01 05:00:00 -0.815905 0.586401 2000-01-01 04:00:00
2000-01-01 06:00:00 -1.185344 2.177858 2000-01-01 04:00:00
2000-01-01 07:00:00 -0.638001 0.046314 2000-01-01 04:00:00
2000-01-01 08:00:00 -0.134608 0.294528 2000-01-01 04:00:00
2000-01-01 09:00:00 0.425651 0.709888 2000-01-01 04:00:00
2000-01-01 10:00:00 -0.378901 -0.877367 2000-01-01 04:00:00
2000-01-01 11:00:00 -0.504592 0.322824 2000-01-01 04:00:00
2000-01-01 12:00:00 1.442753 -1.145960 2000-01-01 12:00:00
2000-01-01 13:00:00 0.437722 -0.445725 2000-01-01 12:00:00
2000-01-01 14:00:00 2.509730 -0.106108 2000-01-01 14:00:00
2000-01-01 15:00:00 -0.618179 -1.079270 2000-01-01 14:00:00
2000-01-01 16:00:00 -1.377722 -1.445645 2000-01-01 14:00:00
2000-01-01 17:00:00 0.529527 -2.500947 2000-01-01 17:00:00
2000-01-01 18:00:00 -0.263954 -0.576484 2000-01-01 17:00:00
2000-01-01 19:00:00 -0.177062 0.422974 2000-01-01 17:00:00
2000-01-01 20:00:00 0.173764 2.116644 2000-01-01 17:00:00
2000-01-01 21:00:00 -1.248605 -0.594601 2000-01-01 17:00:00
2000-01-01 22:00:00 -1.138183 -0.282523 2000-01-01 17:00:00
2000-01-01 23:00:00 0.047580 0.496086 2000-01-01 17:00:00
2000-01-02 00:00:00 1.618901 -1.910404 2000-01-02 00:00:00
2000-01-02 01:00:00 0.127997 0.783554 2000-01-02 00:00:00
2000-01-02 02:00:00 0.702277 1.720010 2000-01-02 02:00:00
2000-01-02 03:00:00 -0.801874 -2.302547 2000-01-02 02:00:00
2000-01-02 04:00:00 1.636838 -0.940251 2000-01-02 04:00:00
2000-01-02 05:00:00 -1.204564 0.517969 2000-01-02 04:00:00
2000-01-02 06:00:00 -0.700013 0.075867 2000-01-02 04:00:00
2000-01-02 07:00:00 -0.234283 -1.899428 2000-01-02 04:00:00
2000-01-02 08:00:00 0.730711 0.254155 2000-01-02 08:00:00
2000-01-02 09:00:00 -0.188994 2.035390 2000-01-02 08:00:00
2000-01-02 10:00:00 1.384640 -1.319800 2000-01-02 10:00:00
2000-01-02 11:00:00 -0.288324 -1.219386 2000-01-02 10:00:00
2000-01-02 12:00:00 -0.642150 -0.449078 2000-01-02 10:00:00
2000-01-02 13:00:00 1.615771 0.497375 2000-01-02 13:00:00
2000-01-02 14:00:00 -1.422133 1.934081 2000-01-02 13:00:00
2000-01-02 15:00:00 -1.541841 1.202525 2000-01-02 13:00:00
2000-01-02 16:00:00 -2.463243 0.020996 2000-01-02 13:00:00
2000-01-02 17:00:00 -0.445203 0.462241 2000-01-02 13:00:00
2000-01-02 18:00:00 0.376458 -1.190448 2000-01-02 13:00:00
2000-01-02 19:00:00 1.040431 0.006403 2000-01-02 19:00:00
2000-01-02 20:00:00 -0.145096 -0.961192 2000-01-02 19:00:00
2000-01-02 21:00:00 -0.127414 0.604989 2000-01-02 19:00:00
2000-01-02 22:00:00 -0.054637 0.070836 2000-01-02 19:00:00
2000-01-02 23:00:00 -0.581572 0.634429 2000-01-02 19:00:00
2000-01-03 00:00:00 0.021646 0.837573 2000-01-02 19:00:00
2000-01-03 01:00:00 -1.785810 2.178076 2000-01-02 19:00:00
修改强> 在pandas github网站上发布此结论后,结论是这是一个错误,为此,安全方法是执行以下操作:
df.loc[df.A > 0.5,'LAST_TIME_A_ABOVE_X'] = df.loc[df.A > 0.5].index.tolist()