我的数据框看起来如下
id start_date end_date
3001 1-1-2000 5-1-2000
3849 5-1-2001 8-1-2001
8927 6-1-2006 9-1-2006
我想要的是一个新的数据框,作为索引具有id和列日期列,它是从start_date到end_date按月递增的日期。
index date
3001 1/1/2000
3001 2/1/2000
3001 3/1/2000
3001 4/1/2000
3001 5/1/2000
3849 5/1/2001
3849 6/1/2001
3849 7/1/2001
3849 7/1/2001
8927 6/1/2006
8927 7/1/2006
8927 8/1/2006
8927 9/1/2006
答案 0 :(得分:1)
重新创建数据框,
In [39]: df = pd.DataFrame( {"id":[ 3001,3849, 8927] , "start_date": ['1-1-2000','1-5-2001','1-6-2006'], "end_date":['1-5-2000','1-8-2001','1-9-2006']})
设定索引
In [40]: df = df.set_index('id')
遍历行。
In [41]: newdf = pd.DataFrame()
In [42]: for id, row in df.iterrows():
newdf = pd.concat( [newdf, pd.DataFrame( {"id":id, "date": pd.date_range( start=row.start_date, end=row.end_date, freq='D')}) ], ignore_index=True)
print id
....:
3001
3849
8927
In [43]: newdf = newdf.set_index('id')
In [44]: newdf
Out[44]:
date
id
3001 2000-01-01
3001 2000-01-02
3001 2000-01-03
3001 2000-01-04
3001 2000-01-05
3849 2001-01-05
3849 2001-01-06
3849 2001-01-07
3849 2001-01-08
8927 2006-01-06
8927 2006-01-07
8927 2006-01-08
8927 2006-01-09
并完成了。
我不清楚你的日期格式,是第一天吗?还是以月为先? 你可以在这里查看:Specifying date format when converting with pandas.to_datetime
当然,请修改其他答案:
In [32]: b = newdf.reset_index().groupby( 'id').date.transform(
lambda ii : ii.max())
In [33]: b
Out[33]:
0 2000-01-05
1 2000-01-05
2 2000-01-05
3 2000-01-05
4 2000-01-05
5 2001-01-08
6 2001-01-08
7 2001-01-08
8 2001-01-08
9 2006-01-09
10 2006-01-09
11 2006-01-09
12 2006-01-09
Name: date, dtype: datetime64[ns]
In [37]: newdf['new_col'] = (newdf.date == b).astype(int)
In [38]: newdf
Out[38]:
date new_col
id
3001 2000-01-01 0
3001 2000-01-02 0
3001 2000-01-03 0
3001 2000-01-04 0
3001 2000-01-05 1
3849 2001-01-05 0
3849 2001-01-06 0
3849 2001-01-07 0
3849 2001-01-08 1
8927 2006-01-06 0
8927 2006-01-07 0
8927 2006-01-08 0
8927 2006-01-09 1
不知怎的,我不能这样做:
newdf['new_col'] = newdf.reset_index().groupby('id').date.transform( lambda ii: ii == ii.max())
....不知道为什么。