我正在尝试创建一个新列,该列将在比较数据框的两列后填充值。这是我尝试过的:
def determinecolor(row,column1,column2):
if row[column1] == row[column2]:
val = 'k'
elif row[column1] > row[column2]:
val = 'r'
else:
val = 'g'
return val
datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)
我收到的错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-182-31188e414958> in <module>()
2 # if(test_shifted['openshifted'][0] > test_pred_list[0]): print("red")
3 datasetTest.loc[:,'predict_close'] = pd.Series(test_pred_list)
----> 4 datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)
5
6 # datasetTest['color_predicted'] = datasetTest.apply(determinePredictedcolor, axis=1)
<ipython-input-178-d1f3e204fd17> in determinecolor(row, column1, column2)
1 def determinecolor(row,column1,column2):
----> 2 if row[column1] == row[column2]:
3 val = 'k'
4 elif row[column1] > row[column2]:
5 val = 'r'
c:\python35\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1119 raise ValueError("The truth value of a {0} is ambiguous. "
1120 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1121 .format(self.__class__.__name__))
1122
1123 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
请帮我解决这个问题。
已编辑
这是一个示例数据集:
open high low close closeTarget openshifted predict_close
0.104167 0.119048 0.117647 0.145833 0.104167 0.416667 0.881613
0.416667 0.285714 0 0.104167 0.4375 0.833333 0.684905
0.833333 0.761905 0.45098 0.4375 0.791667 0.8125 0.821244
0.8125 0.761905 0.784314 0.791667 0.770833 0.8125 0.920608
0.8125 0.761905 0.823529 0.770833 0.8125 0.916667 0.853668
答案 0 :(得分:5)
您不应将pd.DataFrame.apply
用于矢量化操作。
您可以改用numpy.select
来提供条件和值的列表,以及所有其他情况的默认值:
conditions = [df['col1'] == df['col2'], df['col1'] > df['col2']]
values = ['k', 'r']
df['color_original'] = np.select(conditions, values, 'g')
发生错误的原因是您滥用pd.DataFrame.apply
,它将每行传递给一个函数(带有axis=1
)。您无需显式传递数据框作为参数:
df['color_original'] = df.apply(determinecolor, column1='openshifted',
column2='close', axis=1)
答案 1 :(得分:1)
两个np.where
链
x=df['openshifted']-df['close']
np.where(x>0,'r',np.where(x==0,'k','g'))