将行传递给给出错误Pandas Python的函数

时间:2018-07-19 14:22:57

标签: python python-3.x pandas

我正在尝试创建一个新列,该列将在比较数据框的两列后填充值。这是我尝试过的:

def determinecolor(row,column1,column2):
    if row[column1] == row[column2]:
        val = 'k'
    elif row[column1] > row[column2]:
        val = 'r'
    else:
        val = 'g'
    return val
datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)

我收到的错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-182-31188e414958> in <module>()
      2 # if(test_shifted['openshifted'][0] > test_pred_list[0]): print("red")
      3 datasetTest.loc[:,'predict_close'] = pd.Series(test_pred_list)
----> 4 datasetTest['color_original'] = datasetTest.apply(determinecolor(datasetTest,'openshifted','close'), axis=1)
      5 
      6 # datasetTest['color_predicted'] = datasetTest.apply(determinePredictedcolor, axis=1)

<ipython-input-178-d1f3e204fd17> in determinecolor(row, column1, column2)
      1 def determinecolor(row,column1,column2):
----> 2     if row[column1] == row[column2]:
      3         val = 'k'
      4     elif row[column1] > row[column2]:
      5         val = 'r'

c:\python35\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1119         raise ValueError("The truth value of a {0} is ambiguous. "
   1120                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1121                          .format(self.__class__.__name__))
   1122 
   1123     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

请帮我解决这个问题。

已编辑
这是一个示例数据集:

open    high    low close   closeTarget openshifted predict_close
0.104167    0.119048    0.117647    0.145833    0.104167    0.416667    0.881613
0.416667    0.285714    0   0.104167    0.4375  0.833333    0.684905
0.833333    0.761905    0.45098 0.4375  0.791667    0.8125  0.821244
0.8125  0.761905    0.784314    0.791667    0.770833    0.8125  0.920608
0.8125  0.761905    0.823529    0.770833    0.8125  0.916667    0.853668

2 个答案:

答案 0 :(得分:5)

您不应将pd.DataFrame.apply用于矢量化操作。

您可以改用numpy.select来提供条件和值的列表,以及所有其他情况的默认值:

conditions = [df['col1'] == df['col2'], df['col1'] > df['col2']]
values = ['k', 'r']

df['color_original'] = np.select(conditions, values, 'g')

发生错误的原因是您滥用pd.DataFrame.apply,它将每行传递给一个函数(带有axis=1)。您无需显式传递数据框作为参数:

df['color_original'] = df.apply(determinecolor, column1='openshifted',
                                column2='close', axis=1)

答案 1 :(得分:1)

两个np.where

x=df['openshifted']-df['close']
np.where(x>0,'r',np.where(x==0,'k','g'))