import pandas as pd
import numpy as np
from datetime import datetime, time
# history file and batch size for processing.
historyFilePath = 'EURUSD.SAMPLE.csv'
batch_size = 5000
# function for date parsing
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
# load data into a pandas iterator with all the chunks
ratesFromCSVChunks = pd.read_csv(historyFilePath, index_col=0, engine='python', parse_dates=True,
date_parser=dateparse, header=None,
names=["datetime", "1_Current", "2_BidPx", "3_BidSz", "4_AskPx", "5_AskSz"],
iterator=True,
chunksize=batch_size)
# concatenate chunks to get the final array
ratesFromCSV = pd.concat([chunk for chunk in ratesFromCSVChunks])
# save final csv file
df.to_csv('EURUSD_processed.csv', date_format='%Y-%m-%d %H:%M:%S.%f',
columns=['1_Current', '2_BidPx', '3_BidSz', '4_AskPx', '5_AskSz'], header=False, float_format='%.5f')
我正在阅读包含格式为
的外汇数据的CSV文件 2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33930,1.00000
2014-08-17 17:00:01.000000,1.33910,1.33910,1.00000,1.33950,1.00000
2014-08-17 17:00:02.000000,1.33910,1.33910,1.00000,1.33930,1.00000
2014-08-17 17:00:02.000000,1.33900,1.33900,1.00000,1.33940,1.00000
2014-08-17 17:00:04.000000,1.33910,1.33910,1.00000,1.33950,1.00000
2014-08-17 17:00:05.000000,1.33930,1.33930,1.00000,1.33950,1.00000
2014-08-17 17:00:06.000000,1.33920,1.33920,1.00000,1.33960,1.00000
2014-08-17 17:00:06.000000,1.33910,1.33910,1.00000,1.33950,1.00000
2014-08-17 17:00:08.000000,1.33900,1.33900,1.00000,1.33942,1.00000
2014-08-17 17:00:16.000000,1.33900,1.33900,1.00000,1.33940,1.00000
如何将CSV文件中的Datatime或正在读取的pandas数据帧从MIDNIGHT(UTC或本地化)转换为MILLISECONDS中的EPOCH时间。每个文件每天从午夜开始。唯一改变的是每天午夜(UTC或本地化)的datetime到miilliseconds的格式。我正在寻找的格式是:
43264234, 1.33910,1.33910,1.00000,1.33930,1.00000
43264739, 1.33910,1.33910,1.00000,1.33950,1.00000
43265282, 1.33910,1.33910,1.00000,1.33930,1.00000
43265789, 1.33900,1.33900,1.00000,1.33940,1.00000
43266318, 1.33910,1.33910,1.00000,1.33950,1.00000
43266846, 1.33930,1.33930,1.00000,1.33950,1.00000
43267353, 1.33920,1.33920,1.00000,1.33960,1.00000
43267872, 1.33910,1.33910,1.00000,1.33950,1.00000
43268387, 1.33900,1.33900,1.00000,1.33942,1.00000
任何帮助都很受欢迎(在Python 3.5或Python 3.4及以上版本中使用Pandas 0.18.1和numpy 1.11进行简短和精确)
答案 0 :(得分:2)
这段代码应该是您想要的
# Create some fake data, similar to yours
import pandas as pd
s = pd.Series(pd.date_range('2014-08-17 17:00:01.1230000', periods=4))
print(s)
print(type(s[0]))
# Create a new series using just the date portion of the original data.
# This effectively truncates the time portion.
# Can't use d = s.dt.date or you'll get date objects back, not datetime64.
d = pd.to_datetime(s.dt.date)
print(d)
print(type(d[0]))
# Calculate the time delta between the original datetime and
# just the date portion. This is the elapsed time since your epoch.
delta_t = s-d
print(delta_t)
# Display the elapsed time as seconds.
print(delta_t.dt.total_seconds())
这会产生以下输出
0 2014-08-17 17:00:01.123
1 2014-08-18 17:00:01.123
2 2014-08-19 17:00:01.123
3 2014-08-20 17:00:01.123
dtype: datetime64[ns]
<class 'pandas.tslib.Timestamp'>
0 2014-08-17
1 2014-08-18
2 2014-08-19
3 2014-08-20
dtype: datetime64[ns]
<class 'pandas.tslib.Timestamp'>
0 17:00:01.123000
1 17:00:01.123000
2 17:00:01.123000
3 17:00:01.123000
dtype: timedelta64[ns]
0 61201.123
1 61201.123
2 61201.123
3 61201.123
dtype: float64
答案 1 :(得分:0)
以下是我如何使用我的数据:
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=72, freq='H')
df = pd.DataFrame({"Data": np.random.randn(len(rng))}, index=rng)
df["Time_Since_Midnight"] = (df.index - pd.to_datetime(df.index.date)) / np.timedelta64(1, 'ms')
通过将DateTimeIndex
转换为date
对象,我们可以减少时间和秒数。然后通过取两者的差异,得到一个timedelta64
对象,然后可以将其格式化为毫秒。
这是我得到的输出(最后一列是自午夜以来的时间):
2011-01-01 00:00:00 2.383501 0.0
2011-01-01 01:00:00 0.725419 3600000.0
2011-01-01 02:00:00 -0.361533 7200000.0
2011-01-01 03:00:00 2.311185 10800000.0
2011-01-01 04:00:00 1.596148 14400000.0