Question

我试图根据日期范围在我的数据框中添加一个期间列。以下是我的数据框示例。

               story               date  sentiment  price  ccwords  CCWordsCount    fltprice
Story_Num                           
0   it was a curious choice...  2012-01-16  0       $6.68    1.0           1          6.68
1   when he was a yale ...      2013-04-07  0       $162.30  1.0           2          162.30
2   video bitcoin has real...   2013-04-11  0       $124.90  1.0           5          124.90
3   bitcoin s wild ride may...  2013-04-14  0       $90.00    1.0          7          90.00
4   amid the incense cheap...   2013-05-06  1       $112.30  0.0           0          112.30
5   san francisco eight...      2013-05-29  0       $132.30  1.0           1          132.30

因此，我想添加一列“期间”，其中日期在2009年1月至2013年4月之间的任何行都应为期间1，2013年5月至2017年12月应为期间2，而2018年1月之后的任何行到时期3。

我尝试过

df9['period'] = '1' if df9['date'] < '4/30/2013'

df9.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 411 entries, 0 to 410
Data columns (total 7 columns):
story              411 non-null object
date               411 non-null datetime64[ns]
sentiment          411 non-null int64
 bitcoin price     411 non-null object
ccwords            411 non-null float64
CCWordsCount       411 non-null int64
fltprice           411 non-null float64
dtypes: datetime64[ns](1), float64(2), int64(2), object(2)
memory usage: 25.7+ KB

Answer 1

将cut与日期时间一起使用：

bins = pd.to_datetime(['2000-01-01','2013-04-30','2018-01-31'])
df['new'] = pd.cut(df['date'], bins=bins, labels=[1,2]).cat.add_categories([3]).fillna(3)

或将Series.between与numpy.select：

m1 = df['date'].between('2000-01-01','2013-04-30')
m2 = df['date'].between('2013-05-01','2018-01-31')

df['new'] = np.select([m1, m2], [1,2], default=3)

根据熊猫中另一列中的日期添加期间列

1 个答案: