Pandas使用数据帧作为字典或查找

时间:2018-03-30 11:17:16

标签: python pandas dictionary dataframe lookup

没有在这里获胜。需要使用传递到数据框中的自由文本字段来查找第二个数据框中的不同列:

df1 = pd.read_csv('Hotel_reviews.csv')

...
user:     Review:
Julie     'Sheets were dirty'
Samantha  'Meal arrived cold'
Rachel    'Cocktails were delicious'
]
...

想象^

以上的大量数据
df2 = [{'Keyword':['Sheets','Cocktails','Meal'],
'Department' :['Bedrooms','Restaurant','Restaurant'],
'Issue Type':['Beds','Drinks','Food']}]

我尝试了很多方法来实现这个目标:

df3 =
user:     Review:                     Department:     Issue Type:
Julie     'Sheets were dirty'         'Bedrooms'      'Beds'
Samantha  'Meal arrived cold'         'Restaurant'    'Food'
Rachel    'Cocktails were delicious'  'Restaurant'    'Drinks'

这是我尝试过的:

TRY1

def find_dept(review):
    words = review.split(' ')
    for word in words:
        if word.isin(df2['Keyword']):
             return df2[df2['word'] ==word]['Department']

dept = df['Review'].apply(find_dept)

TRY2

for dept in df2['Department']:    
     if dept.isin(review):
          return True

TRY3

review_dict = df2.to_dict('series')
def r_dict(review):
    return review_dict[review]

dept = df['Review'].apply(r_dict)

毋庸置疑,我正在努力......

道歉格式不正确,这是一个组成的例子,我的咖啡因水平正在下降

1 个答案:

答案 0 :(得分:2)

这是一种方式。我们的想法是将您的映射字典转换为keyword: (department, issue)格式。

然后使用生成器表达式查找第一个匹配项,循环遍历新词典。

最后,通过pd.Series.apply(pd.Series)将一系列元组拆分为2列。

注意字典不被视为已订购。因此,对于多场比赛,您应该考虑选择哪一场比赛的机会。如果要按特定顺序搜索,请使用有序字典(查找collections.OrderedDict)。

import pandas as pd

df = pd.DataFrame([['Julie', 'Sheets were dirty'],
                   ['Samantha', 'Meal arrived cold'],
                   ['Rachel', 'Cocktails were delicious']],
                  columns=['User', 'Review'])

d = {'Keyword': ['Sheets','Cocktails','Meal'],
     'Department' : ['Bedrooms','Restaurant','Restaurant'],
     'Issue Type': ['Beds','Drinks','Food']}

d2 = {key: (dep, iss) for key, dep, iss in \
           zip(d['Keyword'], d['Department'], d['Issue Type'])}

def mapper(x):
    return d2.get(next((i for i in d2 if i in x), None))

df[['Department', 'IssueType']] = df['Review'].apply(mapper).apply(pd.Series)

结果:

       User                    Review  Department IssueType
0     Julie         Sheets were dirty    Bedrooms      Beds
1  Samantha         Meal arrived cold  Restaurant      Food
2    Rachel  Cocktails were delicious  Restaurant    Drinks