我有一个巨大的数据框,其日期时间类型列名为dt
,数据框已根据dt
进行排序。我想根据dt
将数据框拆分为多个数据框,每个数据框都包含1 hr
范围内的行。
分割
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
3 20160811 12:36 d
4 20160811 12:52 e
5 20160811 14:32 f
进入
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
答案 0 :(得分:7)
根据groupby
转换为dt
的列hour
的第一个值的差异,您需要astype
:
S = pd.to_datetime(df.dt)
for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
List comprehension
解决方案:
S = pd.to_datetime(df.dt)
print ((S - S[0]).astype('timedelta64[h]'))
0 0.0
1 0.0
2 0.0
3 1.0
4 1.0
5 3.0
Name: dt, dtype: float64
L = [g.reset_index(drop=True) for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')])]
print (L[0])
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
print (L[1])
dt text
0 20160811 12:36 d
1 20160811 12:52 e
print (L[2])
dt text
0 20160811 14:32 f
旧解决方案,按hour
分割:
您可以groupby
使用dt.hour
,但首先需要转换dt
to_datetime
:
for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
1 20160811 11:35 b
dt text
0 20160811 12:03 c
1 20160811 12:36 d
2 20160811 12:52 e
dt text
0 20160811 14:32 f
List comprehension
解决方案:
L = [g.reset_index(drop=True) for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour])]
print (L[0])
dt text
0 20160811 11:05 a
1 20160811 11:35 b
print (L[1])
dt text
0 20160811 12:03 c
1 20160811 12:36 d
2 20160811 12:52 e
print (L[2])
dt text
0 20160811 14:32 f
或者使用list comprehension
将dt
列转换为datetime
:
df.dt = pd.to_datetime(df.dt)
L =[g.reset_index(drop=True) for i, g in df.groupby([df['dt'].dt.hour])]
print (L[1])
dt text
0 2016-08-11 12:03:00 c
1 2016-08-11 12:36:00 d
2 2016-08-11 12:52:00 e
print (L[2])
dt text
0 2016-08-11 14:32:00 f
如果需要按date
和hour
s分开:
#changed dataframe for testing
print (df)
dt text
0 20160811 11:05 a
1 20160812 11:35 b
2 20160813 12:03 c
3 20160811 12:36 d
4 20160811 12:52 e
5 20160811 14:32 f
serie = pd.to_datetime(df.dt)
for i, g in df.groupby([serie.dt.date, serie.dt.hour]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
dt text
0 20160812 11:35 b
dt text
0 20160813 12:03 c
答案 1 :(得分:2)