有条件地创建Dataframe列,其中列值的计算根据行输入而变化

时间:2019-06-27 13:02:33

标签: python dataframe conditional-statements

我有一个很长很宽的数据框。我想在该数据框中创建一个新列,该值取决于df中的许多其他列。此新列中的值所需的计算也要更改,具体取决于其他列中的值。

this questionthis question的答案接近,但对我来说还不太有效。

我最终将可以应用大约30种不同的计算,因此我不太热衷于np.where函数,该函数在太多情况下都不易理解。

还强烈建议我不要对数据帧中的所有行进行for循环,因为这可能会降低性能(如果我错了,请纠正我)。

我尝试做的是:

import pandas as pd
import numpy as np

# Information in my columns look something like this:
df['text'] = ['dab', 'def', 'bla', 'zdag', 'etc']
df['values1'] = [3 , 4, 2, 5, 2]
df['values2'] = [6, 3, 21, 44, 22]
df['values3'] = [103, 444, 33, 425, 200]

# lists to check against to decide upon which calculation is required
someList = ['dab', 'bla']
someOtherList = ['def', 'zdag']
someThirdList = ['etc']

conditions = [
    (df['text'] is None),
    (df['text'] in someList),
    (df['text'] in someOtherList),
    (df['text'] in someThirdList)]
choices = [0, 
           round(df['values2'] * 0.5 * df['values3'], 2), 
           df['values1'] + df['values2'] - df['values3'], 
           df['values1'] + 249]
df['mynewvalue'] = np.select(conditions, choices, default=0)
print(df)

我希望基于df['text']中的行值,将正确的计算应用于df['mynewvalue']的相同行值。

相反,我收到错误The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我该如何编程,以便可以使用这种条件为df ['mynewvalue']列定义正确的计算?

1 个答案:

答案 0 :(得分:1)

错误来自以下条件:

conditions = [
    ... ,
    (df['text'] in someList),
    (df['text'] in someOtherList),
    (df['text'] in someThirdList)]

您尝试询问列表中是否有几个元素。答案是一个列表(针对每个元素)。正如错误所暗示的那样,您必须决定是否在至少一个元素验证属性(any)或所有元素都验证属性(any)时验证条件。

一种解决方案是对isin数据帧使用all (doc)pandas (doc)

此处使用any

import pandas as pd
import numpy as np

# Information in my columns look something like this:
df = pd.DataFrame()

df['text'] = ['dab', 'def', 'bla', 'zdag', 'etc']
df['values1'] = [3, 4, 2, 5, 2]
df['values2'] = [6, 3, 21, 44, 22]
df['values3'] = [103, 444, 33, 425, 200]

# other lists to test against whether
someList = ['dab', 'bla']
someOtherList = ['def', 'zdag']
someThirdList = ['etc']

conditions = [
    (df['text'] is None),
    (df['text'].isin(someList)),
    (df['text'].isin(someOtherList)),
    (df['text'].isin(someThirdList))]
choices = [0,
           round(df['values2'] * 0.5 * df['values3'], 2),
           df['values1'] + df['values2'] - df['values3'],
           df['values1'] + 249]
df['mynewvalue'] = np.select(conditions, choices, default=0)
print(df)
#    text  values1  values2  values3  mynewvalue
# 0   dab        3        6      103       309.0
# 1   def        4        3      444      -437.0
# 2   bla        2       21       33       346.5
# 3  zdag        5       44      425      -376.0
# 4   etc        2       22      200       251.0