如何使用匹配索引合并两个数据框?

时间:2020-04-18 02:48:15

标签: python-3.x pandas

因此,我有两个DataFrame:Historic和Applet。 历史记录包含我学校过去提供的所有课程的清单,而小程序是我学校目前提供的所有课程

我想合并两个DataFrame,以便添加Applet DataFrame中“历史”中不存在的任何项目,并且所有确实存在的项目都覆盖“历史”中的副本(某些课程可能已更新信息,并应覆盖其历史条目带有这些信息..)

我目前正在使用Historic.combine_first(Applet)来合并两个索引。但是,我希望重复的条目覆盖其历史条目,而不仅仅是重复条目。

代码:

    def update2(self):
        historic = pd.read_csv('course_history.txt', header=None, sep='"', encoding = 'ISO-8859-1',
                               names=['Course_ID', 'Course_Title', 'Professor_Name','Meeting_Time','Enrollment','Room','Year','Term','Credit'],index_col=[0,6,7])
        winnet = pd.DataFrame(self.data, columns =['Course_ID', 'Course_Title', 'Professor_Name','Meeting_Time','Enrollment','Room','Year','Term','Credit'] )
        winnet.set_index(['Course_ID','Year','Term'], inplace=True)
        historic3 = historic.combine_first(winnet)

历史数据框:

Course_ID   Year    Term                                                                           ...
AC 230 01   2020-21 May                         Accounting Systems              Crouse, Justin D.  ...    ROOM NULL   1.00
AC 429 01   2020-21 May                                 CPA Review              Sommermeyer, Eric  ...    ROOM NULL   1.00
ART 150 01  2020-21 May          20th-Century Art, Media, & Design            Fedeler, Barbara J.  ...    ROOM NULL   1.00
ART 208 01  2020-21 May                              Photography I               Payne, Thomas R.  ...    ROOM NULL   1.00
PSY 222 01  2018-19 FA                 Cognitive Psychology                   Eslick Watkins, A    ...    ROOM NULL   1.00

Applet DataFrame:

Course_ID       Year    Term
PSY  101  01    2018-19 FA           Introduction to Psychology            Bane, C    T H  9:35AM-11:15AM    40/44/0  LH   330     1.00
PSY  101  02    2018-19 FA           Introduction to Psychology  Eslick Watkins, A     T H  1:00PM-2:40PM    40/43/0  SC   134     1.00
PSY  210  10    2018-19 FA                    Child Development           Munir, S    T H  9:35AM-11:15AM    30/10/0  LH   327     0.50
PSY  211  20    2018-19 FA               Adolescent Development           Munir, S     T H  1:00PM-2:40PM     30/6/0  LH   330     0.50
PSY  222  01    2018-19 FA                 Cognitive Psychology  Eslick Watkins, A    T H  9:35AM-11:15AM    30/24/0  LH   324     1.00

1 个答案:

答案 0 :(得分:1)

您可以先使用concat,然后再使用drop_duplicates

cols = [columns_to_judge_duplicates]
combined = pd.concat([Applet, Historic])
combined = combined.drop_duplicates(subset=cols, method='first')