根据给定的日期记录创建开始日期和结束日期列

时间:2020-06-01 08:27:51

标签: python pandas dataframe date

假设我的数据集

Name date
A    15-01-01
A    15-01-31
A    16-02-02
A    16-03-04
B    17-04-05
B    17-05-08
B    17-07-09
C    18-01-02
C    18-02-03

我想将每个人的第一个和最后一个日期添加到新列中。

得到我想要的

Name date     startdate enddate
A    15-01-01 15-01-01  16-03-04
A    15-01-31 15-01-01  16-03-04
A    16-02-02 15-01-01  16-03-04
A    16-03-04 15-01-01  16-03-04
B    17-04-05 17-04-05  17-07-09
B    17-05-08 17-04-05  17-07-09
B    17-07-09 17-04-05  17-07-09
C    18-01-02 18-01-02  18-02-03
C    18-02-03 18-01-02  18-02-03

有没有办法使之成为可能?如果您让我知道,我将不胜感激。

感谢您阅读

2 个答案:

答案 0 :(得分:3)

GroupBy.transformGroupBy.firstGroupBy.last结合使用并添加到新列:

g = df.groupby('Name')['date']
df = df.assign(startdate = g.transform('first'), enddate = g.transform('last'))
print (df)
  Name      date startdate   enddate
0    A  15-01-01  15-01-01  16-03-04
1    A  15-01-31  15-01-01  16-03-04
2    A  16-02-02  15-01-01  16-03-04
3    A  16-03-04  15-01-01  16-03-04
4    B  17-04-05  17-04-05  17-07-09
5    B  17-05-08  17-04-05  17-07-09
6    B  17-07-09  17-04-05  17-07-09
7    C  18-01-02  18-01-02  18-02-03
8    C  18-02-03  18-01-02  18-02-03

或者如果需要使用GroupBy.minGroupBy.max的最小值和最大值:

df['date'] = pd.to_datetime(df['date'], format='%y-%m-%d')
g = df.groupby('Name')['date']
df = df.assign(startdate = g.transform('min'), enddate = g.transform('max'))
print (df)
  Name       date  startdate    enddate
0    A 2015-01-01 2015-01-01 2016-03-04
1    A 2015-01-31 2015-01-01 2016-03-04
2    A 2016-02-02 2015-01-01 2016-03-04
3    A 2016-03-04 2015-01-01 2016-03-04
4    B 2017-04-05 2017-04-05 2017-07-09
5    B 2017-05-08 2017-04-05 2017-07-09
6    B 2017-07-09 2017-04-05 2017-07-09
7    C 2018-01-02 2018-01-02 2018-02-03
8    C 2018-02-03 2018-01-02 2018-02-03

GroupBy.aggDataFrame.join替代:

df1 = df.groupby('Name').agg(startdate = ('date','first'), lastdate = ('date','last'))
df = df.join(df1, on='Name')

答案 1 :(得分:1)

有一种方法可以做到。虽然我不太确定,但是应该可以。

df['date'] = pd.to_datetime(df['date'], format='%y-%m-%d')
start_date =[]
end_date = []
first_ittr = True;
index = 0

for i in df['date']:
    if(first_ittr):
        start_date[index] = i
        end_date[index]=i
        first_ittr = False

    elif(i>end_date):
        end_date[index] = i

    elif(i<start_date):
        start_date[index] = i

df['startdate'] = start_date
df['enddate'] = end_date