Question

我是python的新手，并尝试使用pandas中的数据帧进行一些工作

左侧是主数据帧（df1）的一部分，右侧是第二个（df2）。目标是使用基于几个条件逻辑的字符串填充df1 ['vd_type']列。我可以使用嵌套的np.where（）函数来完成这项工作，但是随着它越来越深入到层次结构中，它根本无法运行，所以我正在寻找更优雅的解决方案。

英文版的逻辑是这样的：对于df1 ['vd_type']：如果df1 ['shape'] == df2 ['vd_combo']和df1 ['vd_pct']＆lt; = df2 ['combo_value']中的前两个字符，则返回最后3个df2 ['vd_combo']中的字符在这两个条件都为真的行上。如果在df2中找不到两个条件都为真的行，则返回“vd4”。

提前致谢！

编辑＃2：所以我想基于另一个变量实现第三个条件，其他一切都相同，除了在df1中有另一个列'log_vsc'和现有值，目标是填充一个空的df1列'vsc_type'与同一方案中的4个字符串之一。额外的条件就是我们刚刚定义的'vd_type'与分裂'vsc_combo'产生的'vd'列相匹配。

df3 = pd.DataFrame()
df3['vsc_combo'] = ['A1_vd1_vsc1','A1_vd1_vsc2','A1_vd1_vsc3','A1_vd2_vsc1','A1_vd2_vsc2' etc etc etc
df3['combo_value'] = [(number), (number), (number), (number), (number), etc etc

df3[['shape','vd','vsc']] = df3['vsc_combo'].str.split('_', expand = True)

def vsc_condition( row, df3):
    df_select = df3[(df3['shape'] == row['shape']) & (df3['vd'] == row['vd_type']) & (row['log_vsc'] <= df3['combo_value'])]
    if df_select.empty:
        return 'vsc4'
    else:
        return df_select['vsc'].iloc[0]

## apply vsc_type
df1['vsc_type'] = df1.apply( vsc_condition, args = ([df3]), axis = 1)

这有效!!再次感谢！

Answer 1

所以你的输入就像：

QuantPsyc::lm.beta(lm(mpg ~ disp + wt + drat, data=mtcars))

如果您不反对在df2中创建列（如果问题可以在最后删除它们），则通过拆分列{{1}生成两列import pandas as pd df1 = pd.DataFrame({'shape': ['A2', 'A1', 'B1', 'B1', 'A2'], 'vd_pct': [0.78, 0.33, 0.48, 0.38, 0.59]} ) df2 = pd.DataFrame({'vd_combo': ['A1_vd1', 'A1_vd2', 'A1_vd3', 'A2_vd1', 'A2_vd2', 'A2_vd3', 'B1_vd1', 'B1_vd2', 'B1_vd3'], 'combo_value':[0.38, 0.56, 0.68, 0.42, 0.58, 0.71, 0.39, 0.57, 0.69]} )和shape }}：

vd

然后，您可以创建一个将在vd_combo中使用的函数df2[['shape','vd']] = df2['vd_combo'].str.split('_',expand=True)，例如：

condition

现在，您可以使用以下内容在apply中创建专栏def condition( row, df2): # row will be a row of df1 in apply # here you select only the rows of df2 with your conditions on shape and value df_select = df2[(df2['shape'] == row['shape']) & (row['vd_pct'] <= df2['combo_value'])] # if empty (your condition not met) then return vd4 if df_select.empty: return 'vd4' # if your condition met, then return the value of 'vd' the smallest else: return df_select['vd'].iloc[0]

vd_type

df1就像：

df1['vd_type'] = df1.apply( condition, args =([df2]), axis=1)

基于多条件逻辑

1 个答案: