Question

我想根据第二个数据框中包含的一致性在数据框中标记项目。

我可以使用迭代方法来做到这一点，但我确定有更优雅（更快）的方法，如果我只知道使用哪个关键字来找到它

示例：

import pandas as pd
import numpy as np

#data needing labels, in column A
df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})

print(df1)
#        A
# 0    one
# 1    one
# 2    two
# 3  three
# 4    two
# 5    two
# 6    one
# 7  three

#concordance, corresponding labels (C) for items (B)                 
df2 = pd.DataFrame({'B': 'one two three'.split(),
               'C': '1 2 3'.split()})

print(df2)
#        B  C
# 0    one  1
# 1    two  2
# 2  three  3

#new column (D) to contain labels of items in column A
df1['D']=np.NaN

#sucky iterative way of doing this           
for index, row in df1.iterrows():
    df1.loc[index,'D']=int(df2[df2['B']==df1.loc[index,'A']]['C'])

print(df1)
#        A  D
# 0    one  1
# 1    one  1
# 2    two  2
# 3  three  3
# 4    two  2
# 5    two  2
# 6    one  1
# 7  three  3

Answer 1

当我发现合并时，一切变得更加容易，这允许您将第二个df用作字典/ concondance：

#set up
import numpy as np

df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})
df2 = pd.DataFrame({'B': 'one two three'.split(),
                    'C': '1 2 3'.split()})

#one line solution
df1=df1.merge(df2,how='left',left_on='A',right_on='B')

#tidying up to give the answer requested in the original question
df1.drop('B', axis=1, inplace=True) #delete column
df1.columns = ['A', 'D'] #rename C -> D

Answer 2

在set_index上致电df2到列＆＃39; B＆＃39;然后在map上致电df1['A']：

In [55]:
df1['D'] = df1['A'].map(df2.set_index('B')['C'])
df1

Out[55]:
       A  D
0    one  1
1    one  1
2    two  2
3  three  3
4    two  2
5    two  2
6    one  1
7  three  3

Answer 3

如果您因任何原因不想在set_index()上致电df2，此代码也应有效。

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})
df2 = pd.DataFrame({'B': 'one two three'.split(),
                    'C': '1 2 3'.split()})

df1['D']=np.NaN

for item in list(df1['A'].unique()):
    items_to_label = df1[(df1['A'] == item)].copy()
    labels = df2[(df2['B'] == item)]
    items_to_label['D'] = int(labels['C'])
    df1[(df1['A'] == item)] = items_to_label

print(df1)

根据第二个数据帧中的相应值标记数据框中的项目

3 个答案: