根据第二个数据帧中的相应值标记数据框中的项目

时间:2015-11-24 15:56:03

标签: python pandas

我想根据第二个数据框中包含的一致性在数据框中标记项目。

我可以使用迭代方法来做到这一点,但我确定有更优雅(更快)的方法,如果我只知道使用哪个关键字来找到它

示例:

import pandas as pd
import numpy as np

#data needing labels, in column A
df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})

print(df1)
#        A
# 0    one
# 1    one
# 2    two
# 3  three
# 4    two
# 5    two
# 6    one
# 7  three

#concordance, corresponding labels (C) for items (B)                 
df2 = pd.DataFrame({'B': 'one two three'.split(),
               'C': '1 2 3'.split()})

print(df2)
#        B  C
# 0    one  1
# 1    two  2
# 2  three  3

#new column (D) to contain labels of items in column A
df1['D']=np.NaN

#sucky iterative way of doing this           
for index, row in df1.iterrows():
    df1.loc[index,'D']=int(df2[df2['B']==df1.loc[index,'A']]['C'])

print(df1)
#        A  D
# 0    one  1
# 1    one  1
# 2    two  2
# 3  three  3
# 4    two  2
# 5    two  2
# 6    one  1
# 7  three  3

3 个答案:

答案 0 :(得分:1)

当我发现合并时,一切变得更加容易,这允许您将第二个df用作字典/ concondance:

#set up
import numpy as np

df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})
df2 = pd.DataFrame({'B': 'one two three'.split(),
                    'C': '1 2 3'.split()})

#one line solution
df1=df1.merge(df2,how='left',left_on='A',right_on='B')

#tidying up to give the answer requested in the original question
df1.drop('B', axis=1, inplace=True) #delete column
df1.columns = ['A', 'D'] #rename C -> D

答案 1 :(得分:0)

set_index上致电df2到列' B'然后在map上致电df1['A']

In [55]:
df1['D'] = df1['A'].map(df2.set_index('B')['C'])
df1

Out[55]:
       A  D
0    one  1
1    one  1
2    two  2
3  three  3
4    two  2
5    two  2
6    one  1
7  three  3

答案 2 :(得分:0)

如果您因任何原因不想在set_index()上致电df2,此代码也应有效。

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'A': 'one one two three two two one three'.split()})
df2 = pd.DataFrame({'B': 'one two three'.split(),
                    'C': '1 2 3'.split()})

df1['D']=np.NaN

for item in list(df1['A'].unique()):
    items_to_label = df1[(df1['A'] == item)].copy()
    labels = df2[(df2['B'] == item)]
    items_to_label['D'] = int(labels['C'])
    df1[(df1['A'] == item)] = items_to_label

print(df1)