我有一个代码,使用数据框在给定列标签(X)的情况下查找值(P):
df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
df_1['P'] = df_2 .lookup(df_1.index, df_1['X'])
当我在df_1中给它添加标签但不在df_2中包含该标签时,如下所示:
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
我得到:
KeyError: 'One or more column labels was not found'
我如何跳过那些,以获得:
X P
0 7 NaN
1 2 1
2 3 4
3 1 4
4 1 1
5 2 1
6 1 3
7 3 1
8 2 2
9 1 2
答案 0 :(得分:2)
从document添加try
... except
result = []
for row, col in zip(df_1.index, df_1.X):
try :
result.append(df_2.loc[row, col])
except :
result.append(np.nan)
result
Out[135]: [nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
答案 1 :(得分:2)
get
和默认值def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
[*map(get_lu(df_2), df_1.index, df_1.X)]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()]
[nan, 1, 4, 4, 1, 1, 3, 1, 2, 2]
df_1.assign(P=[df_2.get(j, {}).get(i, np.nan) for i, j in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
df_1.assign(P=[df_2.rename_axis('X', 1).stack().get(x, np.nan) for x in df_1.X.items()])
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
答案 2 :(得分:1)
比@piRSquared慢一点,但使用loc
+ lambda
:
>> df_1['P'] = df_1.apply(lambda x: df_2.loc[x.name, x.values[0]] if x.values[0] in df_2.columns else np.nan, axis=1)
>> df_1
X P
0 7 NaN
1 2 1.0
2 3 4.0
3 1 4.0
4 1 1.0
5 2 1.0
6 1 3.0
7 3 1.0
8 2 2.0
9 1 2.0
答案 3 :(得分:1)
此答案使用numpy且速度很快...
import numpy as np
设置数据框
df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]})
df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2],
2 : [4,1,2,3,4,1,2,1,2,3],
3 : [2,3,4,1,2,3,4,1,2,5]})
-
# designate working columns
lookup_cols = [1, 2, 3]
key_col = 'X'
result_col = 'P'
# get key column values as an array
key = df_1[key_col].values
# make an array of nans to hold the lookup results
result = np.full(key.shape[0], np.nan)
# create a boolean array containing only valid lookup indexes
b = np.isin(key, lookup_cols)
# filter df_1 and df_2 with boolean array b
df_1b = df_1[b]
df_2b = df_2[b]
# lookup values using filtered dataframes
lup = df_2b.lookup(df_1b.index, df_1b[key_col])
# put the results into the result array at proper index locations using b
result[b] = lup
# assign the result array to the dataframe result column
df_1[result_col] = result
答案 4 :(得分:0)
如果我想使用df_1中的另一列而不是索引,那么piRSquared的答案变为:
df_1 = pd.DataFrame({'M' : ['X','Y','Z','X','Y','F','Y'],
'N' : ['A','C','B','B','A','A','F']})
df_2 = pd.DataFrame({'A' : [1,2,3],
'B' : [4,1,2],
'C' : [2,3,4]},
index = ['X', 'Y', 'Z'])
def get_lu(df):
def lu(i, j):
return df.get(j, {}).get(i, np.nan)
return lu
df_1['O'] = [*map(get_lu(df_2), df_1.M, df_1.N)]
哪个给:
M N O
0 X A 1.0
1 Y C 3.0
2 Z B 2.0
3 X B 4.0
4 Y A 2.0
5 F A NaN
6 Y F NaN