合并不具备Python和Pandas唯一索引的DataFrame

时间:2018-05-09 14:26:42

标签: python pandas

我提供了两个数据帧。一个包含不同校区食品类型的学校食品评级。第一个df是学生评分,第二个是教师评分。结果的顺序和df的长度无法保证。多数民众赞成说,我需要将两者结合在一起。

import pandas as pd 

student_ratings = pd.DataFrame({'food': ['chinese', 'mexican', 'american', 'chinese', 'mexican', 'american'],
                                'campus': [37, 37, 37, 25, 25, 25],
                                'student_rating': [97, 90, 83, 96, 89, 82]})

teacher_ratings = pd.DataFrame({'food': ['chinese', 'mexican', 'american', 'chinese', 'mexican', 'american', 'chinese', 'mexican', 'american'],
                                'campus': [25, 25, 25, 37, 37, 37, 45, 45, 45],
                                'teacher_rating': [87, 80, 73, 86, 79, 72, 67, 62, 65]})

#...

# SOMETHING LIKE WHAT I'M AFTER...
combined_ratings = pd.DataFrame({'food': ['chinese', 'mexican', 'american', 'chinese', 'mexican', 'american', 'chinese', 'mexican', 'american'],
                                 'campus': [25, 25, 25, 37, 37, 37, 45, 45, 45],
                                 'student_rating': [96, 89, 82, 97, 90, 83, Nan, NaN, NaN],
                                 'teacher_rating': [87, 80, 73, 86, 79, 72, 67, 62, 65]})

我基本上想要添加列(可能有多个列),但我需要按foodcampus

匹配所有内容

1 个答案:

答案 0 :(得分:2)

好像你需要一个外部合并:

res = pd.merge(student_ratings, teacher_ratings, how='outer')

print(res)

   campus      food  student_rating  teacher_rating
0      37   chinese            97.0              86
1      37   mexican            90.0              79
2      37  american            83.0              72
3      25   chinese            96.0              87
4      25   mexican            89.0              80
5      25  american            82.0              73
6      45   chinese             NaN              67
7      45   mexican             NaN              62
8      45  american             NaN              65