没有在这里获胜。需要使用传递到数据框中的自由文本字段来查找第二个数据框中的不同列:
df1 = pd.read_csv('Hotel_reviews.csv') ... user: Review: Julie 'Sheets were dirty' Samantha 'Meal arrived cold' Rachel 'Cocktails were delicious' ] ...
想象^
以上的大量数据df2 = [{'Keyword':['Sheets','Cocktails','Meal'], 'Department' :['Bedrooms','Restaurant','Restaurant'], 'Issue Type':['Beds','Drinks','Food']}]
我尝试了很多方法来实现这个目标:
df3 = user: Review: Department: Issue Type: Julie 'Sheets were dirty' 'Bedrooms' 'Beds' Samantha 'Meal arrived cold' 'Restaurant' 'Food' Rachel 'Cocktails were delicious' 'Restaurant' 'Drinks'
这是我尝试过的:
def find_dept(review): words = review.split(' ') for word in words: if word.isin(df2['Keyword']): return df2[df2['word'] ==word]['Department'] dept = df['Review'].apply(find_dept)
for dept in df2['Department']: if dept.isin(review): return True
review_dict = df2.to_dict('series') def r_dict(review): return review_dict[review] dept = df['Review'].apply(r_dict)
毋庸置疑,我正在努力......
道歉格式不正确,这是一个组成的例子,我的咖啡因水平正在下降
答案 0 :(得分:2)
这是一种方式。我们的想法是将您的映射字典转换为keyword: (department, issue)
格式。
然后使用生成器表达式查找第一个匹配项,循环遍历新词典。
最后,通过pd.Series.apply(pd.Series)
将一系列元组拆分为2列。
注意字典不被视为已订购。因此,对于多场比赛,您应该考虑选择哪一场比赛的机会。如果要按特定顺序搜索,请使用有序字典(查找collections.OrderedDict
)。
import pandas as pd
df = pd.DataFrame([['Julie', 'Sheets were dirty'],
['Samantha', 'Meal arrived cold'],
['Rachel', 'Cocktails were delicious']],
columns=['User', 'Review'])
d = {'Keyword': ['Sheets','Cocktails','Meal'],
'Department' : ['Bedrooms','Restaurant','Restaurant'],
'Issue Type': ['Beds','Drinks','Food']}
d2 = {key: (dep, iss) for key, dep, iss in \
zip(d['Keyword'], d['Department'], d['Issue Type'])}
def mapper(x):
return d2.get(next((i for i in d2 if i in x), None))
df[['Department', 'IssueType']] = df['Review'].apply(mapper).apply(pd.Series)
结果:
User Review Department IssueType
0 Julie Sheets were dirty Bedrooms Beds
1 Samantha Meal arrived cold Restaurant Food
2 Rachel Cocktails were delicious Restaurant Drinks