我有一个带有datetime列的数据框df。整个数据框有2000万行,为方便起见,我仅取3行。
df = pd.DataFrame({})
df['Date'] = pd.to_datetime(np.arange(0,3), unit='h', origin='2018-08-01 00:00:00')
Date
0 2018-08-01 00:00:00
1 2018-08-01 01:00:00
2 2018-08-01 02:00:00
从日期时间开始,我想创建新列'00_hrs','01_hrs','02_hrs'(等等,直到'23_hrs'),其中给定日期时间的小时值为0或1。适用于列中给出的小时,否则为0。
结果应如下所示:
Date 00_hrs 01_hrs 02_hrs ... 23_hrs
0 2018-08-01 00:00:00 1 0 0 0
1 2018-08-01 01:00:00 0 1 0 0
2 2018-08-01 02:00:00 0 0 1 0
答案 0 :(得分:2)
将get_dummies
与Series.dt.strftime
产生的小时数一起使用,然后在DataFrame.join
之前添加为原始内容:
df = df.join(pd.get_dummies(df['Date'].dt.strftime('%H_hrs')))
print (df)
Date 00_hrs 01_hrs 02_hrs
0 2018-08-01 00:00:00 1 0 0
1 2018-08-01 01:00:00 0 1 0
2 2018-08-01 02:00:00 0 0 1
如果可能的话,可能会缺少几个小时,请在DataFrame.reindex
之前添加它们:
hours = [f'{n:02}_hrs' for n in range(24)]
df = (df.join(pd.get_dummies(df['Date'].dt.strftime('%H_hrs'))
.reindex(hours, axis=1, fill_value=0)))
print (df)
Date 00_hrs 01_hrs 02_hrs 03_hrs 04_hrs 05_hrs 06_hrs \
0 2018-08-01 00:00:00 1 0 0 0 0 0 0
1 2018-08-01 01:00:00 0 1 0 0 0 0 0
2 2018-08-01 02:00:00 0 0 1 0 0 0 0
07_hrs 08_hrs 09_hrs 10_hrs 11_hrs 12_hrs 13_hrs 14_hrs 15_hrs \
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
16_hrs 17_hrs 18_hrs 19_hrs 20_hrs 21_hrs 22_hrs 23_hrs
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0