我有两个数据帧,我想要像excel中的Vlookup一样进行查找。
df['NewColumn'] = np.where(
df['key'].str.contains('C') & df['data2'].isnull(), df['data1'], 1
)
如何将df_orig.head()
A
0 3
1 4
2 6
3 7
4 8
df_new
Combined Length Group_name
0 [8, 9, 112, 114, 134, 135] 6 Group 1
1 [15, 16, 17, 18, 19, 20] 6 Group 2
2 [15, 16, 17, 18, 19] 5 Group 3
3 [16, 17, 18, 19, 20] 5 Group 4
4 [15, 16, 17, 18] 4 Group 5
5 [8, 9, 112, 114] 4 Group 6
6 [18, 19, 20] 3 Group 7
7 [28, 29, 30] 3 Group 8
8 [21, 22] 2 Group 9
9 [28, 29] 2 Group 10
10 [26, 27] 2 Group 11
11 [24, 25] 2 Group 12
12 [3, 4] 2 Group 13
13 [6, 7] 2 Group 14
14 [11, 14] 2 Group 15
15 [12, 13] 2 Group 16
16 [0, 1] 2 Group 17
中的值添加到df_new["Group_name"]
?
df_orig["A"]
必须基于查询"Group_name"
中df_orig["A"]
的值。
所以它看起来像:
df_new["Combined"]
谢谢!
答案 0 :(得分:1)
这是一种模仿vlookup的方法。下面的最小例子。
import pandas as pd
df_origin = pd.DataFrame({'A': [3, 11, 0, 12, 6]})
df_new = pd.DataFrame({'Combined': [[3, 4, 5], [6, 7], [11, 14, 20],
[12, 13], [3, 1], [0, 4]],
'Group_name': ['Group 13', 'Group 14', 'Group 15',
'Group 16', 'Group 17', 'Group 18']})
df_new['ID'] = list(zip(*df_new['Combined'].tolist()))[0]
df_origin['Group_name'] = df_origin['A'].map(df_new.drop_duplicates('ID')\
.set_index('ID')['Group_name'])
结果
A Group_name
0 3 Group 13
1 11 Group 15
2 0 Group 18
3 12 Group 16
4 6 Group 14
<强>解释强>
df_new['Combined']
zip
中提取列表的第一个元素
drop_duplicates
,然后创建一系列地图ID
到Group_name
。pd.Series.map
通过此系列将df_origin['A']
映射到Group_name
。答案 1 :(得分:1)
两个步骤***unnest***
+ merge
df=pd.DataFrame({'Combined':df.Combined.sum(),'Group_name':df['Group_name'].repeat(df.Length)})
df_orig.merge(df.groupby('Combined').head(1).rename(columns={'Combined':'A'}))
Out[77]:
A Group_name
0 3 Group 13
1 4 Group 13
2 6 Group 14
3 7 Group 14
4 8 Group 1