我正在使用的数据框有2个列和数百个组的4种可能组合。
| Group | Before | After |
|:-----:|:----------:|:----------:|
| G1 | Injection | Injection |
| G1 | Injection | Production |
| G1 | Production | Injection |
| G1 | Production | Production |
有3个预先计算的列需要根据前/后组合进行拉取,如下所示。
| Group | Before | After | Output |
|:-----:|:----------:|:----------:|:----------------------:|
| G1 | Injection | Injection | df['DTI'] |
| G1 | Injection | Production | df['DTWF'] + df['DTP'] |
| G1 | Production | Injection | df['DTWF'] + df['DTI'] |
| G1 | Production | Production | df['DTP'] |
我尝试过嵌套多个np.where&#39>
np.where(df['Before'] == 'Injection' & df['After'] == 'Injection', df['DTI'],
np.where(....))
导致:
ValueError:应该给出x和y两者或两者都不
并嵌套多个np.logical:
np.where(np.logical_and(df['Before'] == 'Injection' & df['After'] == 'Injection'), df['DTP'])
导致:
DataFrame的真值是不明确的。使用a.empty,a.bool(),a.item(),a.any()或a.all()。
我已达到我能做的上限,需要一些想法!
答案 0 :(得分:0)
一种方法是使用apply函数:
假设您的DataFrame位于变量df
中,您可以执行以下操作:
import pandas as pd
df = pd.DataFrame(data={"Before": ["Injection", "Injection", "Production", "Production"],
"After": ["Injection", "Production", "Injection", "Production"]})
def get_output(x):
if x['Before'] == 'Injection' and x['After'] == 'Injection':
return 'DTI'
elif x['Before'] == 'Injection' and x['After'] == 'Production':
return 'DTWF + DTP'
elif x['Before'] == 'Production' and x['After'] == 'Injection':
return 'DTWF + DTI'
elif x['Before'] == 'Production' and x['After'] == 'Production':
return 'DTP'
df['Output'] = df.apply(get_output, axis=1)
答案 1 :(得分:0)
Before["Injection"]
没有按照您的想法行事。在您展示的代码中,它甚至没有定义。
你可能想要的是:
# df definition, skipping Group because it is not needed here
df = pd.DataFrame(data={"Before": ["Injection", "Injection", "Production", "Production"], "After": ["Injection", "Production", "Injection", "Production"]})
df["Output"] = "DTI" # Use one of the cases as default
df.loc[(df["Before"] == "Injection") & (df["After"] == "Production"), "Output"] = "DTWF + DTP"
df[(df["Before"] == "Production") & (df["After"] == "Injection"), "Output"] = "DTWF + DTI"
df[(df["Before"] == "Production") & (df["After"] == "Production"), "Output"] = "DTP"
print(df)
# After Before Output
# 0 Injection Injection DTI
# 1 Production Injection DTWF + DTP
# 2 Injection Production DTWF + DTI
# 3 Production Production DTP
如果你有很多这样的组合,那么使用其他答案中建议的apply
可能更合适。
如果你有很多行,将布尔索引(例如df["Before"] == "Production"
)保存到变量可能是有意义的
before_prod = df["Before"] == "Production"
after_prod = df["After"] == "Production"
df.loc[before_prod & after_prod, "Output"] = "DTP"
...
如果你也只有这两个状态,你可以使用一元否定算子~
免费获得第二个(几乎):
df.loc[before_prod & ~after_prod, "Output"] = "DTWF + DTI"