Question

我有两个DataFrame，一个是＆＃39; recipe＆＃39 ;,成分的组合，另一个是＆＃39;喜欢＆＃39;，其中包含流行的组合。

recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'],
                       'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']})
recipe
     A      B
0  chicken  sweet
1     beef    hot
2     pork  salty
3      egg    hot
4  chicken  sweet
5      egg  salty
6     beef    hot 

like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']})
like
    A      B
0  beef    hot
1   egg  salty

如何添加列＆＃39; C＆＃39;对于食谱，如果＆＃39;中列出的组合类似于＆＃39;，那么我给它赋予价值＆＃39;是＆＃39;否则＆＃39; no＆＃39;？

我想要的结果是

recipe
         A      B    C
0  chicken  sweet   no
1     beef    hot  yes
2     pork  salty   no
3      egg    hot   no
4  chicken  sweet   no
5      egg  salty  yes
6     beef    hot  yes

问题是我的两个数据帧都很大。我不能手动选择＆＃39;喜欢＆＃39;中的项目。并指定“是”＆＃39;标签中的标签＆＃39;。有没有简单的方法可以做到这一点？

Answer 1

您可以使用merge和numpy.where：

df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left')
print df
         A      B     _merge
0  chicken  sweet  left_only
1     beef    hot       both
2     pork  salty  left_only
3      egg    hot  left_only
4  chicken  sweet  left_only
5      egg  salty       both
6     beef    hot       both

df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')

print df[['A','B','C']]
         A      B    C
0  chicken  sweet   no
1     beef    hot  yes
2     pork  salty   no
3      egg    hot   no
4  chicken  sweet   no
5      egg  salty  yes
6     beef    hot  yes

使用更快df['_merge'] == 'both'：

In [460]: %timeit np.where(np.in1d(df['_merge'],'both'), 'yes', 'no')
100 loops, best of 3: 2.22 ms per loop

In [461]: %timeit np.where(df['_merge'] == 'both', 'yes', 'no')
1000 loops, best of 3: 652 µs per loop

Answer 2

您可以将C 'yes'列添加到like，然后将recipe与like合并。匹配的行在yes列中将显示C，没有匹配的行将具有NaN s。然后，您可以使用fillna将{Na}替换为'no' s：

import pandas as pd
recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'],
                       'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']})

like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']})
like['C'] = 'yes'
result = pd.merge(recipe, like, how='left').fillna('no')
print(result)

产量

         A      B    C
0  chicken  sweet   no
1     beef    hot  yes
2     pork  salty   no
3      egg    hot   no
4  chicken  sweet   no
5      egg  salty  yes
6     beef    hot  yes

Answer 3

您可以通过匹配A和B来使用set_value：

recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes')
recipe.fillna('no')

哪个会给你：

         A      B    C
0  chicken  sweet   no
1     beef    hot  yes
2     pork  salty   no
3      egg    hot  yes
4  chicken  sweet   no
5      egg  salty  yes
6     beef    hot  yes

注意：这些结果并不意味着我的答案比其他答案更好，反之亦然。

使用set_value：

%timeit recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes'); recipe.fillna('no')
100 loops, best of 3: 2.69 ms per loop

使用merge并创建新的df：

%timeit df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left'); df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
100 loops, best of 3: 8.42 ms per loop

仅使用merge：

%timeit df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
1000 loops, best of 3: 187 µs per loop

同样，这实际上取决于你的时间安排。只是要小心重复数据。

如何根据python pandas.Dataframe中的列表分配标签？

3 个答案: