如何从pandas中的日期和时间列添加时间索引

时间:2014-09-29 09:05:05

标签: python pandas

我有一个OHLC数据框如下:

trade_date trade_time  open_price  high_price  low_price  close_price  volumn 
  19911223      15:00       27.70        27.9      27.60        27.80    1270 
  19911224      15:00       27.90        29.3      27.00        29.05    1050 
  19911225      15:00       29.15        30.0      29.10        29.30    2269 
  19911226      15:00       29.30        29.3      28.00        28.00    1918 
  19911227      15:00       28.00        28.5      28.00        28.45    2105 
  19911228      15:00       28.40        29.3      28.40        29.25    1116 
  19911230      15:00       29.30        29.4      28.80        28.80    1059 
  ........

如何将trade_date和trade_time列合并为时间序列索引? 我查看了类似的问题,它们都基于read_csv ....

2 个答案:

答案 0 :(得分:1)

这是一个完全矢量化的溶液。

将trade_date列转换为datetime64[ns] dtype(可以是int64object dtype a-priori)。将trade_time转换为timedelta64[ns] dtype。您需要通过添加秒组件来提示时间为hh:mm。

对datetime和timedelta求和产生日期时间。

In [5]: pd.to_datetime(df['trade_date'],format='%Y%m%d') + pd.to_timedelta(df['trade_time'] + ':00')
Out[5]: 
0   1991-12-23 15:00:00
1   1991-12-24 15:00:00
2   1991-12-25 15:00:00
3   1991-12-26 15:00:00
4   1991-12-27 15:00:00
5   1991-12-28 15:00:00
6   1991-12-30 15:00:00
dtype: datetime64[ns]

然后您可以直接设置索引

In [6]: df.index = pd.to_datetime(df['trade_date'],format='%Y%m%d') + pd.to_timedelta(df['trade_time'] + ':00')

In [7]: df
Out[7]: 
                     trade_date trade_time  open_price  high_price  low_price  close_price  volumn
1991-12-23 15:00:00    19911223      15:00       27.70        27.9       27.6        27.80    1270
1991-12-24 15:00:00    19911224      15:00       27.90        29.3       27.0        29.05    1050
1991-12-25 15:00:00    19911225      15:00       29.15        30.0       29.1        29.30    2269
1991-12-26 15:00:00    19911226      15:00       29.30        29.3       28.0        28.00    1918
1991-12-27 15:00:00    19911227      15:00       28.00        28.5       28.0        28.45    2105
1991-12-28 15:00:00    19911228      15:00       28.40        29.3       28.4        29.25    1116
1991-12-30 15:00:00    19911230      15:00       29.30        29.4       28.8        28.80    1059

答案 1 :(得分:0)

假设trade_date为dtype Int64且trade_time为str,则以下内容可行:

In [26]:
# use strptime to format the data into a datetime    
import datetime as dt
def datetime(x):
    return dt.datetime.strptime(str(x.trade_date) + '' + x.trade_time, '%Y%m%d%H:%M')
# create a datetime column call apply to do the conversion
df['datetime'] = df.apply(lambda row: datetime(row), axis=1)
# set the index to this datetime, by default this column will become the index and drop it as a column
df.set_index('datetime',inplace=True)
df
Out[26]:
                     trade_date trade_time  open_price  high_price  low_price  \
datetime                                                                        
1991-12-23 15:00:00    19911223      15:00       27.70        27.9       27.6   
1991-12-24 15:00:00    19911224      15:00       27.90        29.3       27.0   
1991-12-25 15:00:00    19911225      15:00       29.15        30.0       29.1   
1991-12-26 15:00:00    19911226      15:00       29.30        29.3       28.0   
1991-12-27 15:00:00    19911227      15:00       28.00        28.5       28.0   
1991-12-28 15:00:00    19911228      15:00       28.40        29.3       28.4   
1991-12-30 15:00:00    19911230      15:00       29.30        29.4       28.8   

                     close_price  volumn  
datetime                                  
1991-12-23 15:00:00        27.80    1270  
1991-12-24 15:00:00        29.05    1050  
1991-12-25 15:00:00        29.30    2269  
1991-12-26 15:00:00        28.00    1918  
1991-12-27 15:00:00        28.45    2105  
1991-12-28 15:00:00        29.25    1116  
1991-12-30 15:00:00        28.80    1059  

In [27]:

df.index.dtype
Out[27]:
dtype('<M8[ns]')