Python DataFrames For循环使用If语句不起作用

时间:2017-02-21 19:59:59

标签: python pandas for-loop dataframe

我有一个名为ES_15M_Summary的DataFrame,系列/ beta位于标题为ES_15M_Summary ['Rolling_OLS_Coefficient']的列上,如下所示:

Column 'Rolling_OLS_Coefficient'

如果上面的图片列('Rolling_OLS_Coefficient')是一个大于.08的值,我想要一个标题为'Long'的新列为二进制'Y'。如果另一列中的值小于.08,我希望该值为“NaN”或“N”(可以正常工作)。

所以我正在写一个for循环来运行列。首先,我创建了一个名为“Long”的新列并将其设置为NaN:

ES_15M_Summary['Long'] = np.nan

然后我做了以下For Loop:

for index, row in ES_15M_Summary.iterrows():
    if ES_15M_Summary['Rolling_OLS_Coefficient'] > .08:
        ES_15M_Summary['Long'] = 'Y'
    else:
        ES_15M_Summary['Long'] = 'NaN'

我收到错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

...引用上面显示的if语句行(如果...> .08 :)。我不知道为什么我得到这个错误或者for循环有什么问题。任何帮助表示赞赏。

1 个答案:

答案 0 :(得分:2)

我认为更好的是使用numpy.where

mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')

样品:

ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09]})
print (ES_15M_Summary)
   Rolling_OLS_Coefficient
0                     0.07
1                     0.01
2                     0.09

mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
print (ES_15M_Summary)
   Rolling_OLS_Coefficient Long
0                     0.07    N
1                     0.01    N
2                     0.09    Y

循环,非常缓慢的解决方案:

for index, row in ES_15M_Summary.iterrows():
    if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
        ES_15M_Summary.loc[index,'Long'] = 'Y'
    else:
        ES_15M_Summary.loc[index,'Long'] = 'N'
print (ES_15M_Summary)
   Rolling_OLS_Coefficient Long
0                     0.07    N
1                     0.01    N
2                     0.09    Y

<强>计时

#3000 rows
ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09] * 1000})
#print (ES_15M_Summary)


def loop(df):
    for index, row in ES_15M_Summary.iterrows():
        if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
            ES_15M_Summary.loc[index,'Long'] = 'Y'
        else:
            ES_15M_Summary.loc[index,'Long'] = 'N'
    return (ES_15M_Summary)

print (loop(ES_15M_Summary))


In [51]: %timeit (loop(ES_15M_Summary))
1 loop, best of 3: 2.38 s per loop

In [52]: %timeit ES_15M_Summary['Long'] = np.where(ES_15M_Summary['Rolling_OLS_Coefficient'] > .08, 'Y', 'N')
1000 loops, best of 3: 555 µs per loop