Question

我正在使用的数据框有2个列和数百个组的4种可能组合。

| Group |   Before   |    After   |
|:-----:|:----------:|:----------:|
|   G1  |  Injection |  Injection |
|   G1  |  Injection | Production |
|   G1  | Production |  Injection |
|   G1  | Production | Production |

有3个预先计算的列需要根据前/后组合进行拉取，如下所示。

| Group |   Before   |    After   |         Output         |
|:-----:|:----------:|:----------:|:----------------------:|
|   G1  |  Injection |  Injection |        df['DTI']       |
|   G1  |  Injection | Production | df['DTWF'] + df['DTP'] |
|   G1  | Production |  Injection | df['DTWF'] + df['DTI'] |
|   G1  | Production | Production |        df['DTP']       |

我尝试过嵌套多个np.where＆＃39>

np.where(df['Before'] == 'Injection' & df['After'] == 'Injection', df['DTI'],
np.where(....))

导致：

ValueError：应该给出x和y两者或两者都不

并嵌套多个np.logical：

np.where(np.logical_and(df['Before'] == 'Injection' & df['After'] == 'Injection'), df['DTP'])

导致：

DataFrame的真值是不明确的。使用a.empty，a.bool（），a.item（），a.any（）或a.all（）。

我已达到我能做的上限，需要一些想法！

Answer 1

一种方法是使用apply函数：

假设您的DataFrame位于变量df中，您可以执行以下操作：

import pandas as pd

df = pd.DataFrame(data={"Before": ["Injection", "Injection", "Production", "Production"],
                        "After": ["Injection", "Production", "Injection", "Production"]})
def get_output(x):
    if x['Before'] == 'Injection' and x['After'] == 'Injection':
        return 'DTI'
    elif x['Before'] == 'Injection' and x['After'] == 'Production':
        return 'DTWF + DTP'
    elif x['Before'] == 'Production' and x['After'] == 'Injection':
        return 'DTWF + DTI'
    elif x['Before'] == 'Production' and x['After'] == 'Production':
        return 'DTP'

df['Output'] = df.apply(get_output, axis=1)

Answer 2

Before["Injection"]没有按照您的想法行事。在您展示的代码中，它甚至没有定义。

你可能想要的是：

# df definition, skipping Group because it is not needed here
df = pd.DataFrame(data={"Before": ["Injection", "Injection", "Production", "Production"], "After": ["Injection", "Production", "Injection", "Production"]})

df["Output"] = "DTI"  # Use one of the cases as default
df.loc[(df["Before"] == "Injection") & (df["After"] == "Production"), "Output"] = "DTWF + DTP"
df[(df["Before"] == "Production") & (df["After"] == "Injection"), "Output"] = "DTWF + DTI"
df[(df["Before"] == "Production") & (df["After"] == "Production"), "Output"] = "DTP"
print(df)
#         After      Before      Output
# 0   Injection   Injection         DTI
# 1  Production   Injection  DTWF + DTP
# 2   Injection  Production  DTWF + DTI
# 3  Production  Production         DTP

如果你有很多这样的组合，那么使用其他答案中建议的apply可能更合适。

如果你有很多行，将布尔索引（例如df["Before"] == "Production"）保存到变量可能是有意义的

before_prod = df["Before"] == "Production"
after_prod = df["After"] == "Production"
df.loc[before_prod & after_prod, "Output"] = "DTP"
...

如果你也只有这两个状态，你可以使用一元否定算子~免费获得第二个（几乎）：

df.loc[before_prod & ~after_prod, "Output"] = "DTWF + DTI"

Dataframe列基于4个条件，嵌套np.where

2 个答案: