我有一个OHLC数据框如下:
trade_date trade_time open_price high_price low_price close_price volumn 19911223 15:00 27.70 27.9 27.60 27.80 1270 19911224 15:00 27.90 29.3 27.00 29.05 1050 19911225 15:00 29.15 30.0 29.10 29.30 2269 19911226 15:00 29.30 29.3 28.00 28.00 1918 19911227 15:00 28.00 28.5 28.00 28.45 2105 19911228 15:00 28.40 29.3 28.40 29.25 1116 19911230 15:00 29.30 29.4 28.80 28.80 1059 ........
如何将trade_date和trade_time列合并为时间序列索引? 我查看了类似的问题,它们都基于read_csv ....
答案 0 :(得分:1)
这是一个完全矢量化的溶液。
将trade_date列转换为datetime64[ns]
dtype(可以是int64
或object
dtype a-priori)。将trade_time转换为timedelta64[ns]
dtype。您需要通过添加秒组件来提示时间为hh:mm。
对datetime和timedelta求和产生日期时间。
In [5]: pd.to_datetime(df['trade_date'],format='%Y%m%d') + pd.to_timedelta(df['trade_time'] + ':00')
Out[5]:
0 1991-12-23 15:00:00
1 1991-12-24 15:00:00
2 1991-12-25 15:00:00
3 1991-12-26 15:00:00
4 1991-12-27 15:00:00
5 1991-12-28 15:00:00
6 1991-12-30 15:00:00
dtype: datetime64[ns]
然后您可以直接设置索引
In [6]: df.index = pd.to_datetime(df['trade_date'],format='%Y%m%d') + pd.to_timedelta(df['trade_time'] + ':00')
In [7]: df
Out[7]:
trade_date trade_time open_price high_price low_price close_price volumn
1991-12-23 15:00:00 19911223 15:00 27.70 27.9 27.6 27.80 1270
1991-12-24 15:00:00 19911224 15:00 27.90 29.3 27.0 29.05 1050
1991-12-25 15:00:00 19911225 15:00 29.15 30.0 29.1 29.30 2269
1991-12-26 15:00:00 19911226 15:00 29.30 29.3 28.0 28.00 1918
1991-12-27 15:00:00 19911227 15:00 28.00 28.5 28.0 28.45 2105
1991-12-28 15:00:00 19911228 15:00 28.40 29.3 28.4 29.25 1116
1991-12-30 15:00:00 19911230 15:00 29.30 29.4 28.8 28.80 1059
答案 1 :(得分:0)
假设trade_date为dtype Int64
且trade_time为str
,则以下内容可行:
In [26]:
# use strptime to format the data into a datetime
import datetime as dt
def datetime(x):
return dt.datetime.strptime(str(x.trade_date) + '' + x.trade_time, '%Y%m%d%H:%M')
# create a datetime column call apply to do the conversion
df['datetime'] = df.apply(lambda row: datetime(row), axis=1)
# set the index to this datetime, by default this column will become the index and drop it as a column
df.set_index('datetime',inplace=True)
df
Out[26]:
trade_date trade_time open_price high_price low_price \
datetime
1991-12-23 15:00:00 19911223 15:00 27.70 27.9 27.6
1991-12-24 15:00:00 19911224 15:00 27.90 29.3 27.0
1991-12-25 15:00:00 19911225 15:00 29.15 30.0 29.1
1991-12-26 15:00:00 19911226 15:00 29.30 29.3 28.0
1991-12-27 15:00:00 19911227 15:00 28.00 28.5 28.0
1991-12-28 15:00:00 19911228 15:00 28.40 29.3 28.4
1991-12-30 15:00:00 19911230 15:00 29.30 29.4 28.8
close_price volumn
datetime
1991-12-23 15:00:00 27.80 1270
1991-12-24 15:00:00 29.05 1050
1991-12-25 15:00:00 29.30 2269
1991-12-26 15:00:00 28.00 1918
1991-12-27 15:00:00 28.45 2105
1991-12-28 15:00:00 29.25 1116
1991-12-30 15:00:00 28.80 1059
In [27]:
df.index.dtype
Out[27]:
dtype('<M8[ns]')