Question

df = pd.DataFrame({'salary': [2000,5000,7000, 3500, 8000],'rate':[2,4,6.5,7,5],'other':[4000,2500,4200, 5000,3000],
                'name':['bob','sam','ram','jam','flu'], 'last_name' :['bob','gan','ram', np.nan, 'flu' ]})

我的数据框为df1，我需要使用基于以下条件的值填充新列：

如果'name'等于'last_name'，则'salary'+'other'
如果'last_name'是null，则'salary'+'other'
如果'name'不等于'last_name'，则('rate' * 'other')+'salary'

我尝试了以下代码，但未给出正确的结果：

if np.where(df["name"] == df["last_name"]) is True:
    df['new_col'] = df['salary'] + df['other']
else:
    df['new_col'] = (df['rate'] * df['other']) + df['salary']

Answer 1

您可以使用pandas DataFrame过滤一次完成这些操作。当您执行类似df["name"] == df["last_name"]之类的操作时，您将创建一个布尔系列（称为“掩码”），然后可以将其用于索引到DataFrame中。

# condition 1 - name == last name
name_equals_lastname = df["name"] == df["last_name"]  # first, create the boolean mask
df.loc[name_equals_lastname, "new_col"] = df["salary"] + df["other"]  # then, use the mask to index into the DataFrame at the correct positions and just set those values

# condition 2 - last name is null
last_name_is_null = df["last_name"].isnull()
df.loc[last_name_is_null, "new_col"] = df["salary"] + df["other"]

# condition 3 - name != last name
name_not_equal_to_last_name = df["name"] != df["last_name"]
df.loc[name_not_equal_to_last_name, "new_col"] = (df["rate"] * df["other"]) + df["salary"]

您还可以将df.apply()与自定义功能一起使用，如下所示：

def my_logic(row):
    if row["name"] == row["last_name"]:
        return row["salary"] + row["other"]
    elif ...  # you can fill in the rest of the logic here

df["new_col"] = df.apply(my_logic, axis=1)  # you need axis=1 to pass rows rather than columns

Answer 2

根据您的条件，您不需要if-else。只需将=ARRAYFORMULA(REGEXREPLACE(TRIM(FLATTEN(QUERY(TRANSPOSE( IF(({B2:C, E2:E, G2:G}="")+ ({B2:C, E2:E, G2:G}="no"),, {B2:C, E2:E, G2:G}&",")),,9^9))), ",$", ))与布尔布尔掩码结合使用

np.where

将新列添加到具有基于多个条件的值的数据框

2 个答案: