根据另一个数据框的值获取数据框的子集(Python)

时间:2019-06-16 06:56:21

标签: python pandas dataframe

我有两个数据框。数据框A中的每一行都是产品的包装,数据框B中包含产品ID和其卖方ID。

数据框A:

package_name | product_1 | product_2 | product_3 | product_4
package a    |     12    |     15    |    NaN    |    NaN
package b    |     17    |     16    |    14     |    NaN
package c    |     12    |     11    |    17     |    19

数据框B:

product_id | seller_id
12         | seller1
15         | seller1
12         | seller2
15         | seller2
17         | seller3
16         | seller3
14         | seller3

(每个产品可以有多个卖家,每个卖家都有多个产品。)

我想知道哪些卖家提供包装产品(基于数据框A)。这是预期的结果:

数据框C:

package_name | product_1 | product_2 | product_3 | product_4 | seller_id
package a    |     12    |     15    |    NaN    |    NaN    | seller1
package a    |     12    |     15    |    NaN    |    NaN    | seller2
package b    |     17    |     16    |    14     |    NaN    | seller3

卖方1和卖方2都具有包装a的“所有”产品,卖方3都具有包装b的“所有”产品。

如何实现Dataframe C?

1 个答案:

答案 0 :(得分:2)

想法是DataFrame.merge与通过集合的匹配子集创建的帮助器DataFrame进行正确连接时使用:

print (B)
   product_id seller_id
0          12   seller1
1          15   seller1
2          12   seller2
3          15   seller2
4          17   seller3
5          16   seller3
6          14   seller3
7          12   seller4
8          15   seller4
9          14   seller4

A = A.set_index('package_name') 
f = lambda x: set([int(y) for y in x if y == y])
a = A.apply(f, axis=1).to_dict()
#print (a)

b = B.groupby('seller_id')['product_id'].apply(set).to_dict()
#print (b)

c = [(k, k1) for k, v in a.items() for k1,v1 in b.items() if v.issubset(v1)]
#print (c)

C1 = pd.DataFrame(c, columns=['package_name','seller_id'])
print (C1)
  package_name seller_id
0    package a   seller1
1    package a   seller2
2    package a   seller4
3    package b   seller3

C = A.merge(C1, on='package_name', how='right')
print (C)
  package_name  product_1  product_2  product_3  product_4 seller_id
0    package a         12         15        NaN        NaN   seller1
1    package a         12         15        NaN        NaN   seller2
2    package a         12         15        NaN        NaN   seller4
3    package b         17         16       14.0        NaN   seller3