我有两个DataFrame,一个是' recipe&#39 ;,成分的组合,另一个是'喜欢',其中包含流行的组合。
recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'],
'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']})
recipe
A B
0 chicken sweet
1 beef hot
2 pork salty
3 egg hot
4 chicken sweet
5 egg salty
6 beef hot
like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']})
like
A B
0 beef hot
1 egg salty
如何添加列' C'对于食谱,如果'中列出的组合类似于',那么我给它赋予价值'是'否则' no'?
我想要的结果是
recipe
A B C
0 chicken sweet no
1 beef hot yes
2 pork salty no
3 egg hot no
4 chicken sweet no
5 egg salty yes
6 beef hot yes
问题是我的两个数据帧都很大。我不能手动选择'喜欢'中的项目。并指定“是”'标签中的标签'。有没有简单的方法可以做到这一点?
答案 0 :(得分:2)
您可以使用merge
和numpy.where
:
df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left')
print df
A B _merge
0 chicken sweet left_only
1 beef hot both
2 pork salty left_only
3 egg hot left_only
4 chicken sweet left_only
5 egg salty both
6 beef hot both
df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
print df[['A','B','C']]
A B C
0 chicken sweet no
1 beef hot yes
2 pork salty no
3 egg hot no
4 chicken sweet no
5 egg salty yes
6 beef hot yes
使用更快df['_merge'] == 'both'
:
In [460]: %timeit np.where(np.in1d(df['_merge'],'both'), 'yes', 'no')
100 loops, best of 3: 2.22 ms per loop
In [461]: %timeit np.where(df['_merge'] == 'both', 'yes', 'no')
1000 loops, best of 3: 652 µs per loop
答案 1 :(得分:1)
您可以将C
'yes'
列添加到like
,然后将recipe
与like
合并。
匹配的行在yes
列中将显示C
,没有匹配的行将具有NaN
s。然后,您可以使用fillna
将{Na}替换为'no'
s:
import pandas as pd
recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'],
'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']})
like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']})
like['C'] = 'yes'
result = pd.merge(recipe, like, how='left').fillna('no')
print(result)
产量
A B C
0 chicken sweet no
1 beef hot yes
2 pork salty no
3 egg hot no
4 chicken sweet no
5 egg salty yes
6 beef hot yes
答案 2 :(得分:1)
您可以通过匹配A
和B
来使用set_value
:
recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes')
recipe.fillna('no')
哪个会给你:
A B C
0 chicken sweet no
1 beef hot yes
2 pork salty no
3 egg hot yes
4 chicken sweet no
5 egg salty yes
6 beef hot yes
注意:这些结果并不意味着我的答案比其他答案更好,反之亦然。
使用set_value
:
%timeit recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes'); recipe.fillna('no')
100 loops, best of 3: 2.69 ms per loop
使用merge
并创建新的df
:
%timeit df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left'); df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
100 loops, best of 3: 8.42 ms per loop
仅使用merge
:
%timeit df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
1000 loops, best of 3: 187 µs per loop
同样,这实际上取决于你的时间安排。只是要小心重复数据。