我有一个名为" df_business"的pandas数据框。我从下面的数据框中得到了样本。我想过滤包含"餐厅"的记录的数据框。在类别列中。任何人都可以建议如何做到这一点?
Code:
print(df_business[1:3])
Sample Data:
address attributes \
1 2824 Milton Rd {u'GoodForMeal': {u'dessert': False, u'latenig...
2 337 Danforth Avenue {u'BusinessParking': {u'garage': False, u'stre...
business_id categories \
1 mLwM-h2YhXl2NCgdS84_Bw [Food, Soul Food, Convenience Stores, Restaura...
2 v2WhjAB3PIBA8J8VxG3wEg [Food, Coffee & Tea]
city hours is_open \
1 Charlotte {u'Monday': u'10:00-22:00', u'Tuesday': u'10:0... 0
2 Toronto {u'Monday': u'10:00-19:00', u'Tuesday': u'10:0... 0
latitude longitude name neighborhood \
1 35.236870 -80.741976 South Florida Style Chicken & Ribs Eastland
2 43.677126 -79.353285 The Tea Emporium Riverdale
postal_code review_count stars state
1 28215 4 4.5 NC
2 M4K 1N7 7 4.5 ON
答案 0 :(得分:4)
将您的categories
列转换为字符串并使用str.contains
:
m = df_business['categories'].astype(str).str.contains('Restaurant')
df_business = df_business.loc[m]
如果您担心部分匹配,可能会对您的正则表达式添加单词边界检查有所帮助:
r'\bRestaurant\b'
这应该对误报有一点宽容。
借鉴jez' data(谢谢!):
In [1864]: df_business
categories review_count
0 [Restaurant, Food] 4
1 [Food] 7
m = df_business['categories'].astype(str).str.contains(r'\bRestaurant\b')
m
0 True
1 False
Name: categories, dtype: bool
df_business = df_business.loc[m]
df_business
categories review_count
0 [Restaurant, Food] 4
答案 1 :(得分:4)
选项1
将所有列表元素组合在一起并查找'Restaurant'
df_business[
df_business.categories.str.join('').str.contains('Restaurant')]
categories review_count
0 [Restaurant, Food] 4
选项2
查找列表中'Restaurant'
所在的索引值
mask = np.concatenate(df_business.categories) == 'Restaurant'
idx = df_business.index.repeat(df_business.categories.str.len())
df_business.loc[np.unique(idx[mask])]
categories review_count
0 [Restaurant, Food] 4
<强>设置强>
借用@jezrael
df_business = pd.DataFrame({'categories':[['Restaurant','Food'],['Food']],
'review_count':[4,7]})
答案 2 :(得分:3)
如果in
中的值是列表,则需要categories
参数:
df_business = df_business[df_business['categories'].apply(lambda x: 'Restaurant' in x)]
或者:
df_business = df_business[df_business['categories'].astype(str).str.contains('Restaurant')]
样品:
df_business = pd.DataFrame({'categories':[['Restaurant','Food'],['Food']],
'review_count':[4,7]})
print (df_business)
categories review_count
0 [Restaurant, Food] 4
1 [Food] 7
df_business = df_business[df_business['categories'].apply(lambda x: 'Restaurant' in x)]
print (df_business)
categories review_count
0 [Restaurant, Food] 4