我试图根据日期范围在我的数据框中添加一个期间列。以下是我的数据框示例。
story date sentiment price ccwords CCWordsCount fltprice
Story_Num
0 it was a curious choice... 2012-01-16 0 $6.68 1.0 1 6.68
1 when he was a yale ... 2013-04-07 0 $162.30 1.0 2 162.30
2 video bitcoin has real... 2013-04-11 0 $124.90 1.0 5 124.90
3 bitcoin s wild ride may... 2013-04-14 0 $90.00 1.0 7 90.00
4 amid the incense cheap... 2013-05-06 1 $112.30 0.0 0 112.30
5 san francisco eight... 2013-05-29 0 $132.30 1.0 1 132.30
因此,我想添加一列“期间”,其中日期在2009年1月至2013年4月之间的任何行都应为期间1,2013年5月至2017年12月应为期间2,而2018年1月之后的任何行到时期3。
我尝试过
df9['period'] = '1' if df9['date'] < '4/30/2013'
df9.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 411 entries, 0 to 410
Data columns (total 7 columns):
story 411 non-null object
date 411 non-null datetime64[ns]
sentiment 411 non-null int64
bitcoin price 411 non-null object
ccwords 411 non-null float64
CCWordsCount 411 non-null int64
fltprice 411 non-null float64
dtypes: datetime64[ns](1), float64(2), int64(2), object(2)
memory usage: 25.7+ KB
答案 0 :(得分:1)
将cut
与日期时间一起使用:
bins = pd.to_datetime(['2000-01-01','2013-04-30','2018-01-31'])
df['new'] = pd.cut(df['date'], bins=bins, labels=[1,2]).cat.add_categories([3]).fillna(3)
或将Series.between
与numpy.select
:
m1 = df['date'].between('2000-01-01','2013-04-30')
m2 = df['date'].between('2013-05-01','2018-01-31')
df['new'] = np.select([m1, m2], [1,2], default=3)