将Pandas DatetimeIndex转换为数字格式

时间:2017-09-30 09:54:19

标签: python pandas datetime type-conversion

我想将我的DataFrame中的DatetimeIndex转换为float格式,可以在我的模型中进行分析。有人可以告诉我该怎么做吗?我需要使用date2num()函数吗? 非常感谢!

5 个答案:

答案 0 :(得分:4)

如果您有像

这样的数据框,请使用astype float
df = pd.DataFrame({'date': ['1998-03-01 00:00:01', '2001-04-01 00:00:01','1998-06-01 00:00:01','2001-08-01 00:00:01','2001-05-03 00:00:01','1994-03-01 00:00:01'] })
df['date'] = pd.to_datetime(df['date'])
df['x'] = list('abcdef')
df = df.set_index('date')

然后

df.index.values.astype(float)

array([  8.88710401e+17,   9.86083201e+17,   8.96659201e+17,
     9.96624001e+17,   9.88848001e+17,   7.62480001e+17])

pd.to_datetime(df.index.values.astype(float))

DatetimeIndex(['1998-03-01 00:00:01', '2001-04-01 00:00:01',
           '1998-06-01 00:00:01', '2001-08-01 00:00:01',
           '2001-05-03 00:00:01', '1994-03-01 00:00:01'],
          dtype='datetime64[ns]', freq=None)

答案 1 :(得分:3)

转换为Timedelta并从dt.total_seconds中提取总秒数:

data = \
{'date': {0: pd.Timestamp('2013-01-01 00:00:00'),
          1: pd.Timestamp('2013-01-02 00:00:00'),
          2: pd.Timestamp('2013-01-03 00:00:00'),
          3: pd.Timestamp('2013-01-04 00:00:00'),
          4: pd.Timestamp('2013-01-05 00:00:00'),
          5: pd.Timestamp('2013-01-06 00:00:00'),
          6: pd.Timestamp('2013-01-07 00:00:00'),
          7: pd.Timestamp('2013-01-08 00:00:00'),
          8: pd.Timestamp('2013-01-09 00:00:00'),
          9: pd.Timestamp('2013-01-10 00:00:00')}}

df = pd.DataFrame.from_dict(data)
df

        date
0 2013-01-01
1 2013-01-02
2 2013-01-03
3 2013-01-04
4 2013-01-05
5 2013-01-06
6 2013-01-07
7 2013-01-08
8 2013-01-09
9 2013-01-10

pd.to_timedelta(df.date).dt.total_seconds()

0    1.356998e+09
1    1.357085e+09
2    1.357171e+09
3    1.357258e+09
4    1.357344e+09
5    1.357430e+09
6    1.357517e+09
7    1.357603e+09
8    1.357690e+09
9    1.357776e+09
Name: date, dtype: float64

或者,也许,数据在int类型中更有用:

pd.to_timedelta(df.date).dt.total_seconds().astype(int)

0    1356998400
1    1357084800
2    1357171200
3    1357257600
4    1357344000
5    1357430400
6    1357516800
7    1357603200
8    1357689600
9    1357776000
Name: date, dtype: int64

答案 2 :(得分:3)

我相信这提供了另一种解决方案,这里假设一个数据帧带有DatetimeIndex。

pd.to_numeric(df.index, downcast='float')
# although normally I would prefer an integer, and to coerce errors to NaN
pd.to_numeric(df.index, errors = 'coerce',downcast='integer')

答案 3 :(得分:0)

我找到了另一个解决方案:

df['date'] = df['date'].astype('datetime64').astype(int).astype(float)

答案 4 :(得分:0)

如果您只想要DateTimeIndex的特定部分,请尝试以下操作:

ADDITIONAL = 1
ddf_c['ts_part_numeric'] = ((ddf_c.index.dt.year * (10000 * ADDITIONAL)) + (ddf_c.index.dt.month * (100 * ADDITIONAL)) + ((ddf_c.index.dt.day) * ADDITIONAL))

输出为

20190523
20190524

可以将其调整为所需的时间分辨率。