将数据时间转换为自UTC午夜以来的毫秒数,或使用Pandas

时间:2016-06-26 23:57:15

标签: python csv datetime numpy pandas

import pandas as pd
import numpy as np
from datetime import datetime, time


# history file and batch size for processing.

historyFilePath = 'EURUSD.SAMPLE.csv'
batch_size = 5000


# function for date parsing
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')


# load data into a pandas iterator with all the chunks
ratesFromCSVChunks = pd.read_csv(historyFilePath, index_col=0, engine='python', parse_dates=True,
                                 date_parser=dateparse, header=None,
                                 names=["datetime", "1_Current", "2_BidPx", "3_BidSz", "4_AskPx", "5_AskSz"],
                                 iterator=True,
                                 chunksize=batch_size)



# concatenate chunks to get the final array
ratesFromCSV = pd.concat([chunk for chunk in ratesFromCSVChunks])

# save final csv file
df.to_csv('EURUSD_processed.csv', date_format='%Y-%m-%d %H:%M:%S.%f',
             columns=['1_Current', '2_BidPx', '3_BidSz', '4_AskPx', '5_AskSz'], header=False, float_format='%.5f')

我正在阅读包含格式为

的外汇数据的CSV文件
    2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33930,1.00000
    2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:02.000000,1.33910,1.33910,1.00000,1.33930,1.00000
    2014-08-17 17:00:02.000000,1.33900,1.33900,1.00000,1.33940,1.00000
    2014-08-17 17:00:04.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:05.000000,1.33930,1.33930,1.00000,1.33950,1.00000
    2014-08-17 17:00:06.000000,1.33920,1.33920,1.00000,1.33960,1.00000
    2014-08-17 17:00:06.000000,1.33910,1.33910,1.00000,1.33950,1.00000
    2014-08-17 17:00:08.000000,1.33900,1.33900,1.00000,1.33942,1.00000
    2014-08-17 17:00:16.000000,1.33900,1.33900,1.00000,1.33940,1.00000

如何将CSV文件中的Datatime或正在读取的pandas数据帧从MIDNIGHT(UTC或本地化)转换为MILLISECONDS中的EPOCH时间。每个文件每天从午夜开始。唯一改变的是每天午夜(UTC或本地化)的datetime到miilliseconds的格式。我正在寻找的格式是:

    43264234, 1.33910,1.33910,1.00000,1.33930,1.00000
    43264739, 1.33910,1.33910,1.00000,1.33950,1.00000
    43265282, 1.33910,1.33910,1.00000,1.33930,1.00000
    43265789, 1.33900,1.33900,1.00000,1.33940,1.00000
    43266318, 1.33910,1.33910,1.00000,1.33950,1.00000
    43266846, 1.33930,1.33930,1.00000,1.33950,1.00000
    43267353, 1.33920,1.33920,1.00000,1.33960,1.00000
    43267872, 1.33910,1.33910,1.00000,1.33950,1.00000
    43268387, 1.33900,1.33900,1.00000,1.33942,1.00000

任何帮助都很受欢迎(在Python 3.5或Python 3.4及以上版本中使用Pandas 0.18.1和numpy 1.11进行简短和精确)

2 个答案:

答案 0 :(得分:2)

这段代码应该是您想要的

# Create some fake data, similar to yours

import pandas as pd
s = pd.Series(pd.date_range('2014-08-17 17:00:01.1230000', periods=4))
print(s)
print(type(s[0]))

# Create a new series using just the date portion of the original data.
# This effectively truncates the time portion. 
# Can't use d = s.dt.date or you'll get date objects back, not datetime64.

d = pd.to_datetime(s.dt.date)
print(d)
print(type(d[0]))

# Calculate the time delta between the original datetime and 
# just the date portion. This is the elapsed time since your epoch.

delta_t = s-d
print(delta_t)

# Display the elapsed time as seconds.

print(delta_t.dt.total_seconds())

这会产生以下输出

0   2014-08-17 17:00:01.123
1   2014-08-18 17:00:01.123
2   2014-08-19 17:00:01.123
3   2014-08-20 17:00:01.123
dtype: datetime64[ns]
<class 'pandas.tslib.Timestamp'>
0   2014-08-17
1   2014-08-18
2   2014-08-19
3   2014-08-20
dtype: datetime64[ns]
<class 'pandas.tslib.Timestamp'>
0   17:00:01.123000
1   17:00:01.123000
2   17:00:01.123000
3   17:00:01.123000
dtype: timedelta64[ns]
0    61201.123
1    61201.123
2    61201.123
3    61201.123
dtype: float64

答案 1 :(得分:0)

以下是我如何使用我的数据:

import pandas as pd
import numpy as np

rng = pd.date_range('1/1/2011', periods=72, freq='H')
df = pd.DataFrame({"Data": np.random.randn(len(rng))}, index=rng)
df["Time_Since_Midnight"] = (df.index - pd.to_datetime(df.index.date)) / np.timedelta64(1, 'ms')

通过将DateTimeIndex转换为date对象,我们可以减少时间和秒数。然后通过取两者的差异,得到一个timedelta64对象,然后可以将其格式化为毫秒。

这是我得到的输出(最后一列是自午夜以来的时间):

2011-01-01 00:00:00  2.383501         0.0
2011-01-01 01:00:00  0.725419   3600000.0
2011-01-01 02:00:00 -0.361533   7200000.0
2011-01-01 03:00:00  2.311185  10800000.0
2011-01-01 04:00:00  1.596148  14400000.0