我需要向数据集中添加更多列。由于一周中有7天,因此,我在数据集中添加了7个其他列,分别为“ day_1”,“ day_2”,...“ day_7”,然后对于所有时间戳数据,我现在要提取日期信息。例如,对于与星期二相对应的行,仅其中的“ day_2”列应为“ 1”,而其他列(day_1,day_3,day_4,day_5,day_6,day_7)应全部为“ 0”。为此,我想使用一种称为“单热编码”的过程。如何为此要求编写代码?
#dataset['day_7'] = dataset.insert(0, 'day_7', 0)
#dataset['day_6'] = dataset.insert(0, 'day_6', 0)
#dataset['day_5'] = dataset.insert(0, 'day_5', 0)
#dataset['day_4'] = dataset.insert(0, 'day_4', 0)
#dataset['day_3'] = dataset.insert(0, 'day_3', 0)
#dataset['day_2'] = dataset.insert(0, 'day_2', 0)
#dataset['day_1'] = dataset.insert(0, 'day_1', 0)
这是数据集中的“日期”列:
#0 2016-01-01 05:00:00
#1 2016-01-01 06:00:00
#2 2016-01-01 07:00:00
#3 2016-01-01 08:00:00
#4 2016-01-01 09:00:00
#5 2016-01-01 10:00:00
#6 2016-01-01 11:00:00
#7 2016-01-01 12:00:00
#8 2016-01-01 13:00:00
#9 2016-01-01 14:00:00
#10 2016-01-01 15:00:00
#11 2016-01-01 16:00:00
#12 2016-01-01 17:00:00
#13 2016-01-01 18:00:00
#14 2016-01-01 19:00:00
#15 2016-01-01 20:00:00
#16 2016-01-01 21:00:00
#17 2016-01-01 22:00:00
#18 2016-01-01 23:00:00
#19 2016-01-02 00:00:00
#20 2016-01-02 01:00:00
#21 2016-01-02 02:00:00
#22 2016-01-02 03:00:00
#23 2016-01-02 04:00:00
#24 2016-01-02 05:00:00
#25 2016-01-02 06:00:00
#26 2016-01-02 07:00:00
#27 2016-01-02 08:00:00
#28 2016-01-02 09:00:00
#29 2016-01-02 10:00:00
#30 2016-01-02 11:00:00
#31 2016-01-02 12:00:00
#32 2016-01-02 13:00:00
#33 2016-01-02 14:00:00
#34 2016-01-02 15:00:00
#35 2016-01-02 16:00:00
#36 2016-01-02 17:00:00
#37 2016-01-02 18:00:00
#38 2016-01-02 19:00:00
#39 2016-01-02 20:00:00
#40 2016-01-02 21:00:00
#41 2016-01-02 22:00:00
#42 2016-01-02 23:00:00
#43 2016-01-03 00:00:00
#44 2016-01-03 01:00:00
#45 2016-01-03 02:00:00
#46 2016-01-03 03:00:00
#47 2016-01-03 04:00:00
#48 2016-01-03 05:00:00
#49 2016-01-03 06:00:00
#50 2016-01-03 07:00:00
#51 2016-01-03 08:00:00
#52 2016-01-03 09:00:00
#53 2016-01-03 10:00:00
#54 2016-01-03 11:00:00
#55 2016-01-03 12:00:00
#56 2016-01-03 13:00:00
#57 2016-01-03 14:00:00
#58 2016-01-03 15:00:00
#59 2016-01-03 16:00:00
#60 2016-01-03 17:00:00
#61 2016-01-03 18:00:00
#62 2016-01-03 19:00:00
#63 2016-01-03 20:00:00
#64 2016-01-03 21:00:00
#65 2016-01-03 22:00:00
#66 2016-01-03 23:00:00
答案 0 :(得分:1)
您可以使用get_dummies:
import pandas as pd
df = pd.DataFrame({'timestamp': [
'2016-01-01 05:00:00', '2016-01-01 06:00:00', '2016-01-01 07:00:00', '2016-01-01 08:00:00',
'2016-01-02 00:00:00', '2016-01-02 06:00:00', '2016-01-02 07:00:00', '2016-01-02 08:00:00',
'2016-01-03 00:00:00', '2016-01-03 06:00:00', '2016-01-03 07:00:00', '2016-01-03 08:00:00',
'2016-01-04 00:00:00', '2016-01-04 06:00:00', '2016-01-04 07:00:00', '2016-01-04 08:00:00',
]})
# convert to datetime
df.timestamp = pd.to_datetime(df.timestamp)
# extract the day and add 1
df['day'] = df.timestamp.dt.dayofweek + 1 # Thanks @Mark Wang
# create one-hot encoding
df_onehot = pd.get_dummies(df.day, prefix='day')
# merge back
df = pd.concat([df,df_onehot], axis=1)
答案 1 :(得分:0)
### Import Libraries
from datetime import datetime
### Get some Data:
dateList = pd.date_range('2012-04-27 05:00:00 ', periods=24)
df = pd.DataFrame(data = dateList, columns = ['Date'])
### Get Day name
df['Day'] = df['Date'].dt.day_name()
### Generate Day columns
Days = pd.get_dummies(df.Day)
### Stitch the two the original and the Days DataFrames together
df = pd.concat([df.drop(['Day'], axis = 1), Days], axis = 1)
df
Date Friday Monday Saturday Sunday Thursday Tuesday Wednesday
0 2012-04-27 05:00:00 1 0 0 0 0 0 0
1 2012-04-28 05:00:00 0 0 1 0 0 0 0
2 2012-04-29 05:00:00 0 0 0 1 0 0 0
3 2012-04-30 05:00:00 0 1 0 0 0 0 0
4 2012-05-01 05:00:00 0 0 0 0 0 1 0