系列的真值含糊不清

时间:2019-10-22 20:10:26

标签: python pandas

数据是一个简短的示例。我将有大约11个不同的“如果这两个条件为True,则返回'this text''并将其应用于3k行。我将列名称写为变量,以避免为每个条件键入列名称。

我不断收到ValueError,真值含糊不清。我所见过的所有帖子都谈到使用按位&并用括号分隔每个测试。我做的。但它仍然错误。我尝试包括完全引用的列,但在ValueError上仍然出错。如果我从函数中取出“ self”,则会收到TypeError。不确定如何弄清楚这一点。

data = [ [3.5, 6], [-4,-8],[4,1] ]
df = pd.DataFrame(data, columns=['line','value'])

l = df['line']
v = df['value']

def errortype(self):
   if (l >=0) & (v > l):
      return 'error1'
   elif (l < 0) & (v < l):
      return 'error2'

df['test']= df.apply(errortype, axis=1)

1 个答案:

答案 0 :(得分:2)

尝试:

data = [ [3.5, 6], [-4,-8],[4,1] ]
df = pd.DataFrame(data, columns=['line','value'])


#l = df['line']  do not need this line
#v = df['value']  do not need this line

def errortype(row):
#     print(row)
    if (row['line'] >=0) & (row['value'] > row['line']):
        return 'error1'
    elif (row['line'] < 0) & (row['value'] < row['line']):
        return 'error2'

df['test']= df.apply(errortype, axis=1)

输出:

   line  value    test
0   3.5      6  error1
1  -4.0     -8  error2
2   4.0      1    None

但是,更好的矢量化方法是使用np.select

cond1 = (df['line'] >= 0) & (df['value'] > df['line'])
cond2 = (df['line'] < 0) & (df['value'] < df['line'])

df['test'] = np.select([cond1,cond2],['error1','error2'],np.nan)

输出:

   line  value    test
0   3.5      6  error1
1  -4.0     -8  error2
2   4.0      1     nan