在没有迭代的情况下,在Pandas DataFrame上矢量化乘法和字典映射?

时间:2016-07-19 23:03:23

标签: python loops pandas dataframe vectorization

我有一个Pandas DataFrame,df

import pandas as pd
import numpy as np
import math

df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})

和dict,mask

mask = {1:32,2:64,3:100,4:200}

我希望我的最终结果是像这样的DataFrame:

A    B    C
1    1    32
2    2    64
2    3    96
4    4    400
nan  nan  nan

现在我正在这样做,这看起来效率不高:

for idx, row in df.iterrows():
    if not math.isnan(row['A']):
        if row['A'] != 1:
            df.loc[idx, 'C'] = row['B'] * mask[row['A'] - 1]
        else:
            df.loc[idx, 'C'] = row['B'] * mask[row['A']]

有一种简单的方法可以对此进行矢量化吗?

2 个答案:

答案 0 :(得分:3)

以下是使用apply的选项,以及字典的get方法,如果密钥不在字典中,则返回None

df['C'] = df.apply(lambda r: mask.get(r.A) if r.A == 1 else mask.get(r.A - 1), axis = 1) * df.B

df    
#   A   B   C
#0  1   1   32
#1  2   2   64
#2  2   3   96
#3  4   4   400
#4  NaN 5   NaN

答案 1 :(得分:3)

这应该有效:

df['C'] = df.B * (df.A - (df.A != 1)).map(mask)

enter image description here

时序

10,000行

# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(2000)])

enter image description here

100,000行

# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(20000)])

enter image description here