我有一些学生数据和他们选择的学科。
id name date from date to Subjectname note
1188 Cera 01-08-2016 30-09-2016 math approved
1188 Cera 01-10-2016 elec
1199 ron 01-06-2017 english app-true
1288 Snow 01-01-2017 tally
1433 sansa 25-01-2016 14-07-2016 tally
1433 sansa 15-07-2016 16-01-2017 tally relected
1844 amy 01-10-2016 10-11-2017 adv
1522 stark 01-01-2016 phy
1722 sid 01-06-2017 31-03-2018 history
1722 sid 01-04-2018 history as per request
1844 amy 01-01-2016 30-09-2016 science
2100 arya 01-08-2016 30-09-2016 english
2100 arya 01-10-2016 31-05-2017 math taken
2100 arya 01-06-2017 english
我正在寻找像这样的外来者
id name from to subject from subject to
1188 Cera 01-08-2016 01-10-2016 math elec
1199 ron 01-06-2017 english
1288 Snow 01-01-2017 tally
1433 sansa 25-01-2016 16-01-2017 tally tally
1522 stark 01-01-2016 phy
1722 sid 01-06-2017 01-04-2018 history history
1844 amy 01-01-2016 10-11-2017 science adv
2100 arya 01-08-2016 31-05-2017 english math
2100 arya 01-06-2017 math english
列“ from”具有与名称相对应的最小日期值。 “至”列具有与名称相对应的最大日期值。 “主题来源”列的“主题名称”值与“来源”和“名称”列相对应。 “主题”列的“主题名称”值对应于“主题”和“名称”列。
我需要跟踪学生进行的交易以及他们更改的主题名称(主题和主题)。 请让我知道如何实现这一目标。
或者,请让我知道是否有一种简单的方法来获得包含每个学生及其所更改学科的交易明细的输出。
答案 0 :(得分:0)
在列{{1}中将DataFrameGroupBy.agg
与set_index
一起使用,因此可以使用idxmin
和
idxmax
的主题,每组的最小和最大日期时间:
Subjectname
答案 1 :(得分:0)
id name date_from date_to subject_name note
0 1188 Cera 2016-01-08 30-09-2016 math approved
1 1188 Cera 2016-01-10 elec
2 1199 ron 2017-01-06 english app-true
只需在此处粘贴代码即可。
# make date from and date to to one column to get max and min date
df1 = df[['id', 'name', 'date_from', 'subject_name', 'note']]
df2 = df[['id', 'name', 'date_to', 'subject_name', 'note']]
df3 = pd.concat([df1,df2])
df1.columns = ['id', 'name', 'date', 'subject_name', 'note']
df2.columns = ['id', 'name', 'date', 'subject_name', 'note']
df3 = pd.concat([df1,df2])
df3['date'] = pd.to_datetime(df3['date'])
df3 = df3.dropna()
df3:
id name date subject_name note
0 1188 Cera 2016-01-08 math approved
1 1188 Cera 2016-01-10 elec
2 1199 ron 2017-01-06 english app-true
0 1188 Cera 2016-09-30 math approved
#here you get from and to date for each name
df4 = df3.groupby('name').agg({'date':[max,min]})
df4.columns = ['to','from']
df4 = df4.reset_index()
df4:
name to from
0 Cera 2016-09-30 2016-01-08
1 ron 2017-01-06 2017-01-06
# match "name" and "to" in df4 with "name" and "date" in df3, you got the earliest subject and latest
df_sub_from = pd.merge(df4,df3,how='left',left_on=['name','to'],right_on=['name','date'])
df_sub_from
df_sub_to = pd.merge(df4,df3,how='left',left_on=['name','to'],right_on=['name','date'])
df_sub_from = pd.merge(df4,df3,how='left',left_on=['name','from'],right_on=['name','date'])
#remove unneed columns
df_sub_from = df_sub_from[['id','name','from','to','subject_name']]
df_sub_to = df_sub_to[['id','name','from','to','subject_name']]
# merge together and rename nicely
df_final = pd.merge(df_sub_from,df_sub_to,left_on=['id','name','from','to'],right_on=['id','name','from','to'])
df_final.columns = ['id','name','from','to','subject_from','subject_to']
在这里:
id name from to subject_from subject_to
0 1188 Cera 2016-01-08 2016-09-30 math math
1 1199 ron 2017-01-06 2017-01-06 english english