我正在尝试创建一个新的分类列'Stages_So'
并将其发布到我的原始数据框中。
Event_Code Timestamp
2053 13/08/2016 11:30
1029 10/09/2016 14:00
2053 02/10/2016 13:15
2053 06/11/2016 16:30
2053 19/11/2016 15:00
2053 03/12/2016 17:30
1029 02/01/2017 15:00
1029 05/02/2017 16:00
2053 11/02/2017 15:00
1029 04/03/2017 15:00
2053 01/04/2017 14:00
1029 21/05/2017 14:00
我尝试了以下功能。
def label_stage(row):
if row['Timestamp'] > '2016-08-12' and row['Timestamp'] < '2016-11-07':
return 0
if row['Timestamp'] > '2016-11-18' and row['Timestamp'] < '2017-02-06':
return 1
if row['Timestamp'] > '2017-02-10' and row['Timestamp'] < '2017-05-22':
return 2
df['Stages_So'] = df.apply(lambda row: label_stage(row), axis=1)
但是它给出了一个错误。
TypeError: ("Cannot compare type 'Timestamp' with type 'str'", 'occurred at index 957')
。
答案 0 :(得分:1)
您首先需要在to_datetime
之前将列转换为日期时间,然后在datetime
s之间进行比较:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
def label_stage(row):
if row['Timestamp'] > pd.Timestamp('2016-08-12') and
row['Timestamp'] < pd.Timestamp('2016-11-07'):
return 0
if row['Timestamp'] > pd.Timestamp('2016-11-18') and
row['Timestamp'] < pd.Timestamp('2017-02-06'):
return 1
if row['Timestamp'] > pd.Timestamp('2017-02-10') and
row['Timestamp'] < pd.Timestamp('2017-05-22'):
return 2
df['Stages_So'] = df.apply(lambda row: label_stage(row), axis=1)
print (df)
Event_Code Timestamp Stages_So
0 2053 2016-08-13 11:30:00 0.0
1 1029 2016-10-09 14:00:00 0.0
2 2053 2016-02-10 13:15:00 NaN
3 2053 2016-06-11 16:30:00 NaN
4 2053 2016-11-19 15:00:00 1.0
5 2053 2016-03-12 17:30:00 NaN
6 1029 2017-02-01 15:00:00 1.0
7 1029 2017-05-02 16:00:00 2.0
8 2053 2017-11-02 15:00:00 NaN
9 1029 2017-04-03 15:00:00 2.0
10 2053 2017-01-04 14:00:00 1.0
11 1029 2017-05-21 14:00:00 2.0
另一个更快的解决方案:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
m1 = (df['Timestamp'] > '2016-08-12') & (df['Timestamp'] < '2016-11-07')
m2 = (df['Timestamp'] > '2016-11-18') & (df['Timestamp'] < '2017-02-06')
m3 = (df['Timestamp'] > '2017-02-10') & (df['Timestamp'] < '2017-05-22')
df['Stages_So'] = np.select([m1, m2, m3], [0,1,2], default=np.nan)
print (df)
Event_Code Timestamp Stages_So
0 2053 2016-08-13 11:30:00 0.0
1 1029 2016-10-09 14:00:00 0.0
2 2053 2016-02-10 13:15:00 NaN
3 2053 2016-06-11 16:30:00 NaN
4 2053 2016-11-19 15:00:00 1.0
5 2053 2016-03-12 17:30:00 NaN
6 1029 2017-02-01 15:00:00 1.0
7 1029 2017-05-02 16:00:00 2.0
8 2053 2017-11-02 15:00:00 NaN
9 1029 2017-04-03 15:00:00 2.0
10 2053 2017-01-04 14:00:00 1.0
11 1029 2017-05-21 14:00:00 2.0