因此,我有两个DataFrame:Historic和Applet。 历史记录包含我学校过去提供的所有课程的清单,而小程序是我学校目前提供的所有课程
我想合并两个DataFrame,以便添加Applet DataFrame中“历史”中不存在的任何项目,并且所有确实存在的项目都覆盖“历史”中的副本(某些课程可能已更新信息,并应覆盖其历史条目带有这些信息..)
我目前正在使用Historic.combine_first(Applet)
来合并两个索引。但是,我希望重复的条目覆盖其历史条目,而不仅仅是重复条目。
代码:
def update2(self):
historic = pd.read_csv('course_history.txt', header=None, sep='"', encoding = 'ISO-8859-1',
names=['Course_ID', 'Course_Title', 'Professor_Name','Meeting_Time','Enrollment','Room','Year','Term','Credit'],index_col=[0,6,7])
winnet = pd.DataFrame(self.data, columns =['Course_ID', 'Course_Title', 'Professor_Name','Meeting_Time','Enrollment','Room','Year','Term','Credit'] )
winnet.set_index(['Course_ID','Year','Term'], inplace=True)
historic3 = historic.combine_first(winnet)
历史数据框:
Course_ID Year Term ...
AC 230 01 2020-21 May Accounting Systems Crouse, Justin D. ... ROOM NULL 1.00
AC 429 01 2020-21 May CPA Review Sommermeyer, Eric ... ROOM NULL 1.00
ART 150 01 2020-21 May 20th-Century Art, Media, & Design Fedeler, Barbara J. ... ROOM NULL 1.00
ART 208 01 2020-21 May Photography I Payne, Thomas R. ... ROOM NULL 1.00
PSY 222 01 2018-19 FA Cognitive Psychology Eslick Watkins, A ... ROOM NULL 1.00
Applet DataFrame:
Course_ID Year Term
PSY 101 01 2018-19 FA Introduction to Psychology Bane, C T H 9:35AM-11:15AM 40/44/0 LH 330 1.00
PSY 101 02 2018-19 FA Introduction to Psychology Eslick Watkins, A T H 1:00PM-2:40PM 40/43/0 SC 134 1.00
PSY 210 10 2018-19 FA Child Development Munir, S T H 9:35AM-11:15AM 30/10/0 LH 327 0.50
PSY 211 20 2018-19 FA Adolescent Development Munir, S T H 1:00PM-2:40PM 30/6/0 LH 330 0.50
PSY 222 01 2018-19 FA Cognitive Psychology Eslick Watkins, A T H 9:35AM-11:15AM 30/24/0 LH 324 1.00
答案 0 :(得分:1)
您可以先使用concat
,然后再使用drop_duplicates
cols = [columns_to_judge_duplicates]
combined = pd.concat([Applet, Historic])
combined = combined.drop_duplicates(subset=cols, method='first')