转换数据框以跟踪更改

时间:2018-11-02 06:48:28

标签: python pandas dataframe

我有一些学生数据和他们选择的学科。

id     name   date from  date to    Subjectname  note
1188    Cera  01-08-2016 30-09-2016 math         approved
1188    Cera  01-10-2016            elec    
1199    ron   01-06-2017            english      app-true
1288    Snow  01-01-2017            tally   
1433    sansa 25-01-2016 14-07-2016 tally   
1433    sansa 15-07-2016 16-01-2017 tally        relected
1844    amy   01-10-2016 10-11-2017 adv 
1522    stark 01-01-2016            phy 
1722    sid   01-06-2017 31-03-2018 history 
1722    sid   01-04-2018            history      as per request
1844    amy   01-01-2016 30-09-2016 science 
2100    arya  01-08-2016 30-09-2016 english 
2100    arya  01-10-2016 31-05-2017 math         taken
2100    arya  01-06-2017            english 

我正在寻找像这样的外来者

id      name    from        to          subject from subject to
1188    Cera    01-08-2016  01-10-2016  math         elec
1199    ron     01-06-2017              english 
1288    Snow    01-01-2017              tally   
1433    sansa   25-01-2016  16-01-2017  tally        tally
1522    stark   01-01-2016              phy 
1722    sid     01-06-2017  01-04-2018  history      history
1844    amy     01-01-2016  10-11-2017  science      adv
2100    arya    01-08-2016  31-05-2017  english      math
2100    arya    01-06-2017              math         english

列“ from”具有与名称相对应的最小日期值。 “至”列具有与名称相对应的最大日期值。 “主题来源”列的“主题名称”值与“来源”和“名称”列相对应。 “主题”列的“主题名称”值对应于“主题”和“名称”列。

我需要跟踪学生进行的交易以及他们更改的主题名称(主题和主题)。 请让我知道如何实现这一目标。

或者,请让我知道是否有一种简单的方法来获得包含每个学生及其所更改学科的交易明细的输出。

2 个答案:

答案 0 :(得分:0)

在列{{1}中将DataFrameGroupBy.aggset_index一起使用,因此可以使用idxminidxmax的主题,每组的最小和最大日期时间:

Subjectname

答案 1 :(得分:0)

在您的前3行中输入我的df,可以演示如何执行此操作。 df:

     id  name  date_from     date_to subject_name      note
0  1188  Cera 2016-01-08  30-09-2016         math  approved
1  1188  Cera 2016-01-10                     elec
2  1199   ron 2017-01-06                  english  app-true

只需在此处粘贴代码即可。

# make date from and date to to one column to get max and min date
df1 = df[['id', 'name', 'date_from', 'subject_name', 'note']]
df2 = df[['id', 'name', 'date_to', 'subject_name', 'note']]
df3 = pd.concat([df1,df2])

df1.columns = ['id', 'name', 'date', 'subject_name', 'note']
df2.columns = ['id', 'name', 'date', 'subject_name', 'note']
df3 = pd.concat([df1,df2])
df3['date'] = pd.to_datetime(df3['date'])
df3 = df3.dropna()
df3:
     id  name       date subject_name      note
0  1188  Cera 2016-01-08         math  approved
1  1188  Cera 2016-01-10         elec
2  1199   ron 2017-01-06      english  app-true
0  1188  Cera 2016-09-30         math  approved
#here you get from and to date for each name
df4 = df3.groupby('name').agg({'date':[max,min]})
df4.columns = ['to','from']
df4 = df4.reset_index()
df4:
   name         to       from
0  Cera 2016-09-30 2016-01-08
1   ron 2017-01-06 2017-01-06
# match "name" and "to" in df4 with "name" and "date" in df3, you got the earliest subject and latest 
df_sub_from = pd.merge(df4,df3,how='left',left_on=['name','to'],right_on=['name','date'])
df_sub_from
df_sub_to = pd.merge(df4,df3,how='left',left_on=['name','to'],right_on=['name','date'])
df_sub_from = pd.merge(df4,df3,how='left',left_on=['name','from'],right_on=['name','date'])
#remove unneed columns
df_sub_from = df_sub_from[['id','name','from','to','subject_name']]
df_sub_to = df_sub_to[['id','name','from','to','subject_name']]
# merge together and rename nicely
df_final = pd.merge(df_sub_from,df_sub_to,left_on=['id','name','from','to'],right_on=['id','name','from','to'])
df_final.columns = ['id','name','from','to','subject_from','subject_to']

在这里:

     id  name       from         to subject_from subject_to
0  1188  Cera 2016-01-08 2016-09-30         math       math
1  1199   ron 2017-01-06 2017-01-06      english    english