Question

我的数据框如下：

    mid price   dse_high_born
0   0.002039    False
1   0.002039    False
2   0.002039    False
3   0.002039    False
4   0.002039    False
5   0.002038    False
6   0.002039    True
7   0.002037    False
8   0.002037    False
9   0.002037    False
10  0.002036    False
11  0.002036    False
12  0.002038    False
13  0.002038    False
14  0.002038    False
15  0.002038    False
16  0.002039    False
17  0.002039    False
18  0.002040    False
19  0.002040    False
20  0.002040    False
21  0.002039    False
22  0.002039    False
23  0.002039    False
24  0.002040    True
25  0.002040    False
26  0.002041    False
27  0.002041    False
28  0.002041    False
29  0.002042    False
30  0.002044    False
31  0.002049    True
32  0.002049    False
33  0.002048    False

... ...

我尝试根据以下条件使用for循环添加新列price：

for index, row in df.iterrows():
    if df['dse_high_born'] == True:
        df.at[index,'price'] = row['mid price']
    else:
        df.at[index,'price'] = 'nan'

我收到以下错误：The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我尝试了每种组合（使用bool（），any（），item（）等），但是当我执行以下请求df[df['price'] != 'nan']时，在数据框中没有出现这种情况的原因，为什么？谢谢！

Answer 1

可以使用np.where以更加简单和有效的方式完成此操作：

import numpy as np
df['price'] = np.where(df.dse_high_born, df.mid_price, np.nan)

    mid_price  dse_high_born  price
0       0.002          False    NaN
1       0.002          False    NaN
2       0.002          False    NaN
3       0.002          False    NaN
4       0.002          False    NaN
5       0.002          False    NaN
6       0.002           True  0.002
7       0.002          False    NaN
...

代码的问题在于，在if语句中，当检查条件df['dse_high_born'] == True:时，您不是在特定行上建立索引，而是在整个列上建立索引。您需要使用.loc，df.loc[index,'dse_high_born']在行和列上建立索引。所以你想要类似的东西：

for index, row in df.iterrows():
    if df.loc[index,'dse_high_born'] == True:
        df.loc[index,'price'] = df.loc[index,'mid_price']
    else:
        df.loc[index,'price'] = np.nan

Answer 2

错误指向df['dse_high_born'] == True。我认为应将其替换为这样的行？

for index, row in df.iterrows():
if row['dse_high_born'] == True:
    df.at[index,'price'] = row['mid price']
else:
    df.at[index,'price'] = 'nan'

对于基于if语句的循环，错误消息=系列的真值不明确

2 个答案: