在数据帧和重复数据删除的数据帧之间配对缺失值并输出到csv

时间:2019-06-04 22:38:09

标签: python python-3.x pandas

我有学校和所提供课程的清单。我还列出了一些独特的课程,其中各学校仅提供一些课程,有些则没有。我想返回每所学校缺少的班级,并附上学校名称。

我已经能够返回每所学校缺课的列表,但是我无法配对并返回与每所学校缺课相对应的学校名称。

读入数据框

schools = {'School': ['School A', 'School A', 'School A', 'School B', 'School B', 'School B', 'School C','School C', 'School D'], 'Class': ['Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'English', 'Math', 'Chemistry', 'Physics']}
dfSchool = pd.DataFrame(data=schools)
dfSchool

classes = {'Class': ['Math', 'Chemistry', 'English', 'History', 'Physics']}
dfClasses = pd.DataFrame(data=classes)
dfClasses

按学校分组

grouped = dfSchool.groupby('School')

newdflist = []

for name, group in grouped:
    newdflist.append(group)
    print(name)
    print(group)

返回每所学校缺少的课程

i = 0
while i < 4:
    missingClasses = dfClasses[~dfClasses['Class'].isin(newdflist[i]['Class'])]
    print(missingClasses)
    i += 1

实际结果:

     Class
3  History
4  Physics

     Class
3  History
4  Physics

     Class
2  English
3  History
4  Physics

       Class
0       Math
1  Chemistry
2    English
3    History

所需结果:

  School    Class
3 School A  History
4 School A  Physics

  School    Class
3 School B  History
4 School B  Physics

  School    Class
2 School C  English
3 School C  History
4 School C  Physics

  School    Class
0 School D      Math
1 School D Chemistry
2 School D   English
3 School D   History

1 个答案:

答案 0 :(得分:0)

在这里打印出所需结果的快速答案:

    for name, group in grouped:
        print(name)
        print(dfClasses[~(dfClasses.Class.isin(group["Class"]))])

我从中得到的结果是:

   School A
    Class
    3 History
    4  Physics
   School B
    Class
    3 History
    4 Physics
   School C
    Class
    2 English
    3 History
    4 Physics
   School D
    Class
    0 Math
    1 Chemistry
    2 English
    3 History

您要做的只是将其放在数据框中而不是打印。

希望这会有所帮助 :)