我正在尝试将一系列时间戳分为几组:
定义变量:
Very old = Date < '20190101'
Current = Today's date as %Y-%m (Year-Month)
条件
1. timestamp < very old
2. Very old < timestamp < current
3. timestamp = current
4. timestamp > current
与原始DataFrame分离的系列:
timestamp_dict = \
{0: Timestamp('2019-05-01 00:00:00'),
1: Timestamp('2019-05-01 00:00:00'),
2: Timestamp('2018-12-01 00:00:00'),
3: Timestamp('2019-05-01 00:00:00'),
4: Timestamp('2019-05-01 00:00:00'),
5: Timestamp('2019-05-01 00:00:00'),
6: Timestamp('2019-04-01 00:00:00'),
7: Timestamp('2019-08-01 00:00:00')}
日期时间存储为datetime64 [ns]。
我感觉将当前时间戳转换为str是错误的,但是,我不确定如何将当前时间戳提取为格式%Y-%m
。
我对访问当前日期(例如月份,年份整数)然后进行级联有一个想法,但是随后我可能会遇到零填充问题:
_month = dt.datetime.today().month
_year = dt.datetime.today().year
# Would run into zero padding for months 1-9:
current = str(_year) + str(_month)
在这里,我尝试使用np.select并指定所需条件来生成新的DataFrame列。
import datetime as dt
current = dt.datetime.today().strftime('%Y-%m')
veryold = '20190101'
conditions = [
df.Delivery < veryold,
(df.Delivery >= veryold | (df.Delivery < current),
df.Delivery == current,
df.Delivery > current
]
outcome = [
'Very old',
'Old',
'Current',
'Future'
]
df['New'] = np.select(conditions, outcome)
df.New
我的预期输出是在我的DataFrame中增加一列标记结果。
答案 0 :(得分:1)
想法是按Series.dt.to_period
创建月份,以供YYYY-MM
进行比较:
current = pd.Timestamp(pd.datetime.today()).to_period('M')
veryold = pd.Timestamp('20190101')
conditions = [
df.Delivery < veryold,
(df.Delivery >= veryold) | (df.Delivery.dt.to_period('M') < current),
df.Delivery.dt.to_period('M') == current,
df.Delivery.dt.to_period('M') > current]
outcome = [
'Very old',
'Old',
'Current',
'Future'
]
df = pd.Series(pd.Timestamp_dict).to_frame('Delivery')
df['New'] = np.select(conditions, outcome)
print(df)
Delivery New
0 2019-05-01 Old
1 2019-05-01 Old
2 2018-12-01 Very old
3 2019-05-01 Old
4 2019-05-01 Old
5 2019-05-01 Old
6 2019-04-01 Old
7 2019-08-01 Old