我想根据第二个数据框中包含的一致性在数据框中标记项目。
我可以使用迭代方法来做到这一点,但我确定有更优雅(更快)的方法,如果我只知道使用哪个关键字来找到它
示例:
import pandas as pd
import numpy as np
#data needing labels, in column A
df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})
print(df1)
# A
# 0 one
# 1 one
# 2 two
# 3 three
# 4 two
# 5 two
# 6 one
# 7 three
#concordance, corresponding labels (C) for items (B)
df2 = pd.DataFrame({'B': 'one two three'.split(),
'C': '1 2 3'.split()})
print(df2)
# B C
# 0 one 1
# 1 two 2
# 2 three 3
#new column (D) to contain labels of items in column A
df1['D']=np.NaN
#sucky iterative way of doing this
for index, row in df1.iterrows():
df1.loc[index,'D']=int(df2[df2['B']==df1.loc[index,'A']]['C'])
print(df1)
# A D
# 0 one 1
# 1 one 1
# 2 two 2
# 3 three 3
# 4 two 2
# 5 two 2
# 6 one 1
# 7 three 3
答案 0 :(得分:1)
当我发现合并时,一切变得更加容易,这允许您将第二个df用作字典/ concondance:
#set up
import numpy as np
df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})
df2 = pd.DataFrame({'B': 'one two three'.split(),
'C': '1 2 3'.split()})
#one line solution
df1=df1.merge(df2,how='left',left_on='A',right_on='B')
#tidying up to give the answer requested in the original question
df1.drop('B', axis=1, inplace=True) #delete column
df1.columns = ['A', 'D'] #rename C -> D
答案 1 :(得分:0)
在set_index
上致电df2
到列' B'然后在map
上致电df1['A']
:
In [55]:
df1['D'] = df1['A'].map(df2.set_index('B')['C'])
df1
Out[55]:
A D
0 one 1
1 one 1
2 two 2
3 three 3
4 two 2
5 two 2
6 one 1
7 three 3
答案 2 :(得分:0)
如果您因任何原因不想在set_index()
上致电df2
,此代码也应有效。
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})
df2 = pd.DataFrame({'B': 'one two three'.split(),
'C': '1 2 3'.split()})
df1['D']=np.NaN
for item in list(df1['A'].unique()):
items_to_label = df1[(df1['A'] == item)].copy()
labels = df2[(df2['B'] == item)]
items_to_label['D'] = int(labels['C'])
df1[(df1['A'] == item)] = items_to_label
print(df1)