我不知道如何向量化我的功能

时间:2019-07-31 10:18:02

标签: python pandas

这是我的功能:

def if_rule2(row):
    item1 = row['first_item']
    item2 = row['second_item']
    weight = row['weight']
    if item2 == item1:
        basic_score = weight
        add_score = 0
    elif item1 in ('No') and item2 in ('Yes'):
        basic_score = weight
        add_score = weight * 0.1
    elif item1 in ('No') and item2 in ('Yes'):
        baisc_score = 1
        add_score = 0
    else:
        basic_score = 0
        add_score = 0
    return [basic_score, add_score]

我需要稍微加快我的代码的速度,我对矢量化的改进很感兴趣。我需要对函数进行矢量化处理,以便接受:

df[['basic_score', 'additional_score']] = if_rule(df['first_item'], df['second_item'], df['weight'])

代替:

df[['basic_score', 'additional_score']] = df.apply(if_rule2, axis=1)

我该怎么办..?

1 个答案:

答案 0 :(得分:0)

此解决方案不包含第三个条件,因为它与第二个条件相同,但是您可以理解。

  • 首先使用else条件定义列
  • 定义具有特定条件的布尔向量
  • 重写向量定义的特定切片的列
# default
df['basic_score'] = 0
df['add_score'] = 0

# first condition
first_condition = (df['first_item'] == df['second_item'])
df.loc[first_condition, 'basic_score'] = df.loc[first_condition, 'weight']
df.loc[first_condition, 'add_score'] = 0

# second condition
second_condition = ((df['first_item'] == 'No') & (df['second_item'] == 'Yes'))
df.loc[second_condition, 'basic_score'] = df.loc[second_condition, 'weight']
df.loc[second_condition, 'add_score'] = df.loc[second_condition, 'weight'] * 0.1