我有一个Pandas DataFrame,df
:
import pandas as pd
import numpy as np
import math
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
和dict,mask
:
mask = {1:32,2:64,3:100,4:200}
我希望我的最终结果是像这样的DataFrame:
A B C
1 1 32
2 2 64
2 3 96
4 4 400
nan nan nan
现在我正在这样做,这看起来效率不高:
for idx, row in df.iterrows():
if not math.isnan(row['A']):
if row['A'] != 1:
df.loc[idx, 'C'] = row['B'] * mask[row['A'] - 1]
else:
df.loc[idx, 'C'] = row['B'] * mask[row['A']]
有一种简单的方法可以对此进行矢量化吗?
答案 0 :(得分:3)
以下是使用apply
的选项,以及字典的get
方法,如果密钥不在字典中,则返回None
:
df['C'] = df.apply(lambda r: mask.get(r.A) if r.A == 1 else mask.get(r.A - 1), axis = 1) * df.B
df
# A B C
#0 1 1 32
#1 2 2 64
#2 2 3 96
#3 4 4 400
#4 NaN 5 NaN
答案 1 :(得分:3)
这应该有效:
df['C'] = df.B * (df.A - (df.A != 1)).map(mask)
10,000行
# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(2000)])
100,000行
# Initialize each run with
df = pd.DataFrame({'A':[1,2,2,4,np.nan],'B':[1,2,3,4,5]})
df = pd.concat([df for _ in range(20000)])