将新列添加到具有基于多个条件的值的数据框

时间:2020-08-19 22:06:58

标签: python pandas dataframe

df = pd.DataFrame({'salary': [2000,5000,7000, 3500, 8000],'rate':[2,4,6.5,7,5],'other':[4000,2500,4200, 5000,3000],
                'name':['bob','sam','ram','jam','flu'], 'last_name' :['bob','gan','ram', np.nan, 'flu' ]})

我的数据框为df1,我需要使用基于以下条件的值填充新列:

  1. 如果'name'等于'last_name',则'salary'+'other'

  2. 如果'last_name'null,则'salary'+'other'

  3. 如果'name'不等于'last_name',则('rate' * 'other')+'salary'

我尝试了以下代码,但未给出正确的结果:

if np.where(df["name"] == df["last_name"]) is True:
    df['new_col'] = df['salary'] + df['other']
else:
    df['new_col'] = (df['rate'] * df['other']) + df['salary']

2 个答案:

答案 0 :(得分:1)

您可以使用pandas DataFrame过滤一次完成这些操作。当您执行类似df["name"] == df["last_name"]之类的操作时,您将创建一个布尔系列(称为“掩码”),然后可以将其用于索引到DataFrame中。

# condition 1 - name == last name
name_equals_lastname = df["name"] == df["last_name"]  # first, create the boolean mask
df.loc[name_equals_lastname, "new_col"] = df["salary"] + df["other"]  # then, use the mask to index into the DataFrame at the correct positions and just set those values

# condition 2 - last name is null
last_name_is_null = df["last_name"].isnull()
df.loc[last_name_is_null, "new_col"] = df["salary"] + df["other"]

# condition 3 - name != last name
name_not_equal_to_last_name = df["name"] != df["last_name"]
df.loc[name_not_equal_to_last_name, "new_col"] = (df["rate"] * df["other"]) + df["salary"]

您还可以将df.apply()与自定义功能一起使用,如下所示:

def my_logic(row):
    if row["name"] == row["last_name"]:
        return row["salary"] + row["other"]
    elif ...  # you can fill in the rest of the logic here

df["new_col"] = df.apply(my_logic, axis=1)  # you need axis=1 to pass rows rather than columns

答案 1 :(得分:0)

根据您的条件,您不需要if-else。只需将=ARRAYFORMULA(REGEXREPLACE(TRIM(FLATTEN(QUERY(TRANSPOSE( IF(({B2:C, E2:E, G2:G}="")+ ({B2:C, E2:E, G2:G}="no"),, {B2:C, E2:E, G2:G}&",")),,9^9))), ",$", )) 与布尔布尔掩码结合使用

np.where