Question

我有一个很长很宽的数据框。我想在该数据框中创建一个新列，该值取决于df中的许多其他列。此新列中的值所需的计算也要更改，具体取决于其他列中的值。

对this question和this question的答案接近，但对我来说还不太有效。

我最终将可以应用大约30种不同的计算，因此我不太热衷于np.where函数，该函数在太多情况下都不易理解。

还强烈建议我不要对数据帧中的所有行进行for循环，因为这可能会降低性能（如果我错了，请纠正我）。

我尝试做的是：

import pandas as pd
import numpy as np

# Information in my columns look something like this:
df['text'] = ['dab', 'def', 'bla', 'zdag', 'etc']
df['values1'] = [3 , 4, 2, 5, 2]
df['values2'] = [6, 3, 21, 44, 22]
df['values3'] = [103, 444, 33, 425, 200]

# lists to check against to decide upon which calculation is required
someList = ['dab', 'bla']
someOtherList = ['def', 'zdag']
someThirdList = ['etc']

conditions = [
    (df['text'] is None),
    (df['text'] in someList),
    (df['text'] in someOtherList),
    (df['text'] in someThirdList)]
choices = [0, 
           round(df['values2'] * 0.5 * df['values3'], 2), 
           df['values1'] + df['values2'] - df['values3'], 
           df['values1'] + 249]
df['mynewvalue'] = np.select(conditions, choices, default=0)
print(df)

我希望基于df['text']中的行值，将正确的计算应用于df['mynewvalue']的相同行值。

相反，我收到错误The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我该如何编程，以便可以使用这种条件为df ['mynewvalue']列定义正确的计算？

Answer 1

错误来自以下条件：

conditions = [
    ... ,
    (df['text'] in someList),
    (df['text'] in someOtherList),
    (df['text'] in someThirdList)]

您尝试询问列表中是否有几个元素。答案是一个列表（针对每个元素）。正如错误所暗示的那样，您必须决定是否在至少一个元素验证属性（any）或所有元素都验证属性（any）时验证条件。

一种解决方案是对isin数据帧使用all (doc)或pandas (doc)。

此处使用any：

import pandas as pd
import numpy as np

# Information in my columns look something like this:
df = pd.DataFrame()

df['text'] = ['dab', 'def', 'bla', 'zdag', 'etc']
df['values1'] = [3, 4, 2, 5, 2]
df['values2'] = [6, 3, 21, 44, 22]
df['values3'] = [103, 444, 33, 425, 200]

# other lists to test against whether
someList = ['dab', 'bla']
someOtherList = ['def', 'zdag']
someThirdList = ['etc']

conditions = [
    (df['text'] is None),
    (df['text'].isin(someList)),
    (df['text'].isin(someOtherList)),
    (df['text'].isin(someThirdList))]
choices = [0,
           round(df['values2'] * 0.5 * df['values3'], 2),
           df['values1'] + df['values2'] - df['values3'],
           df['values1'] + 249]
df['mynewvalue'] = np.select(conditions, choices, default=0)
print(df)
#    text  values1  values2  values3  mynewvalue
# 0   dab        3        6      103       309.0
# 1   def        4        3      444      -437.0
# 2   bla        2       21       33       346.5
# 3  zdag        5       44      425      -376.0
# 4   etc        2       22      200       251.0

有条件地创建Dataframe列，其中列值的计算根据行输入而变化

1 个答案: