我有两个真正重要的数据框,其中两个列。该列之一由float64值组成,另一列为字符串。 数据帧的大小不同。
我想同时匹配Number
列和Item
列,然后仅获取匹配的列。
df1 = pd.DataFrame({ 'Number':[1.0,3.0,4.0,5.0,8.0,12.0,32.0,58.0] , 'Item': ['Phone', 'Watch', 'Pen', 'Pencil', 'Pencil','toolkit','box','fork']})
df2 = pd.DataFrame({'Number':[3.0,4.0,8.0,12.0,15.0,32.0,54.0,58.0,72.0], 'Item':['Watch','Pen','Pencil','Eraser','bottle','box','toolkit','fork','Phone']})
df1
Number Item
0 1.0 Phone
1 3.0 Watch
2 4.0 Pen
3 5.0 Pencil
4 8.0 Pencil
5 12.0 toolkit
6 32.0 box
7 58.0 fork
df2
Number Item
0 3.0 Watch
1 4.0 Pen
2 8.0 Pencil
3 12.0 Eraser
4 15.0 bottle
5 32.0 box
6 54.0 toolkit
7 58.0 fork
8 72.0 Phone
我正在尝试使用forloop,并且循环持续了很长时间。这似乎是实现这一目标的非常糟糕的方法。我正在尝试使用遮罩操作,但不确定如何实现。尽可能以最短的方式提供帮助。
所需的结果应类似于:
Item Matching Number
0 Phone No Match 1.0
1 Watch Matched 3.0
2 Pen Matched 4.0
3 Pencil No Match 5.0
4 Pencil Matched 8.0
5 toolkit No Match 12.0
6 box Matched 32.0
7 fork Matched 58.0
答案 0 :(得分:3)
如果问题合并浮点值,则可能乘以1000
并先转换为整数,然后转换为merge
并进行左连接,因为匹配存在问题,两列的基本浮点精度应该不同:
df1['Number1'] = df1['Number'].mul(1000).astype(int)
df2['Number1'] = df2['Number'].mul(1000).astype(int)
df = pd.merge(df1, df2.drop('Number', 1), how='left', on=['Item','Number1'], indicator=True)
df['Matching'] = df['_merge'].map({'left_only':'No Match', 'both':'Match'})
df = df.drop(['Number1','_merge'], axis=1)
print (df)
Number Item Matching
0 1.0 Phone No Match
1 3.0 Watch Match
2 4.0 Pen Match
3 5.0 Pencil No Match
4 8.0 Pencil Match
5 12.0 toolkit No Match
6 32.0 box Match
7 58.0 fork Match
答案 1 :(得分:2)
您正在寻找与indicator=True
的左合并:
res = pd.merge(df1, df2, how='left', indicator=True)
print(res)
Item Number _merge
0 Phone 1.0 left_only
1 Watch 3.0 both
2 Pen 4.0 both
3 Pencil 5.0 left_only
4 Pencil 8.0 both
5 toolkit 12.0 left_only
6 box 32.0 both
7 fork 58.0 both
通常,在使用专用方法时,请避免显式for
循环,因为这些循环通常针对性能进行了优化。您可以根据需要通过词典映射替换字符串:
d = {'left_only': 'No Match', 'both': 'Matched'}
df['_merge'] = df['_merge'].map(d)
答案 2 :(得分:1)
您可以通过一些简单的loc
和isin
进入所需的数据框,如下所示
df = df1.copy()
df['Matching'] = np.nan
df.loc[(df.Number.isin(df2.Number)) & (df.Item.isin(df2.Item)), 'Matching'] = 'Matched'
df.Matching.fillna('No Match', inplace=True)
Number Item Matching
1.0 Phone No Match
3.0 Watch Matched
4.0 Pen Matched
5.0 Pencil No Match
8.0 Pencil Matched
12.0 toolkit Matched
32.0 box Matched
58.0 fork Matched