我有一张表格,其中日期和时间以year:day_of_year:seconds_of_day
格式(第2栏)给出。可以找到示例文件here
ABCD 15:010:00000 2564.6 4.0 -0.380 0.417 -1.313 0.520
ABCD 15:010:00300 2564.3 3.7 -0.389 0.396 -1.318 0.503
ABCD 15:010:00600 2563.9 3.5 -0.397 0.389 -1.324 0.496
ABCD 15:010:00900 2563.9 3.3 -0.411 0.368 -1.322 0.476
ABCD 15:010:01200 2563.8 3.0 -0.425 0.361 -1.320 0.466
ABCD 15:010:01500 2563.9 2.8 -0.432 0.340 -1.312 0.447
ABCD 15:010:01800 2564.3 2.6 -0.439 0.334 -1.304 0.439
我使用pandas包将上表放在pandas.DataFrame
:
names=['Site', 'Epoch', 'TroTot', 'Stdev','TgnTot', 'TgnStd', 'TgeTot', 'TgeStd']
parser = lambda x: pd.datetime.strptime(x, '%y:%j:%???')
df = pd.read_csv([FILE][1],
header=None,
names=names,
delim_whitespace=True,
parse_dates=['Epoch'],
date_parser=parser)
我查看了文档,但在我看来,一天中没有格式。如何使用此格式更改解析器功能?
答案 0 :(得分:3)
您可以先按原样读取您的CSV文件(不解析日期):
Epoch
现在您可以按如下方式解析In [209]: df['Epoch'] = pd.to_datetime(df['Epoch'].str[:6], format='%y:%j') + \
...: pd.to_timedelta(df['Epoch'].str[7:].astype(int), unit='s')
...:
In [210]: df
Out[210]:
Site Epoch TroTot Stdev TgnTot TgnStd TgeTot TgeStd
0 ABCD 2015-01-10 00:00:00 2564.6 4.0 -0.380 0.417 -1.313 0.520
1 ABCD 2015-01-10 00:05:00 2564.3 3.7 -0.389 0.396 -1.318 0.503
2 ABCD 2015-01-10 00:10:00 2563.9 3.5 -0.397 0.389 -1.324 0.496
3 ABCD 2015-01-10 00:15:00 2563.9 3.3 -0.411 0.368 -1.322 0.476
4 ABCD 2015-01-10 00:20:00 2563.8 3.0 -0.425 0.361 -1.320 0.466
5 ABCD 2015-01-10 00:25:00 2563.9 2.8 -0.432 0.340 -1.312 0.447
6 ABCD 2015-01-10 00:30:00 2564.3 2.6 -0.439 0.334 -1.304 0.439
7 ABCD 2015-01-10 00:35:00 2564.5 2.5 -0.453 0.314 -1.302 0.423
8 ABCD 2015-01-10 00:40:00 2564.2 2.4 -0.467 0.309 -1.299 0.419
9 ABCD 2015-01-10 00:45:00 2563.7 2.3 -0.482 0.287 -1.305 0.404
.. ... ... ... ... ... ... ... ...
278 ABCD 2015-01-10 23:10:00 2561.6 2.2 0.033 0.276 -0.894 0.416
279 ABCD 2015-01-10 23:15:00 2562.1 2.2 0.053 0.271 -0.897 0.418
280 ABCD 2015-01-10 23:20:00 2562.7 2.3 0.073 0.285 -0.899 0.431
281 ABCD 2015-01-10 23:25:00 2562.6 2.3 0.108 0.283 -0.869 0.431
282 ABCD 2015-01-10 23:30:00 2562.7 2.3 0.144 0.299 -0.839 0.442
283 ABCD 2015-01-10 23:35:00 2562.4 2.3 0.175 0.298 -0.824 0.441
284 ABCD 2015-01-10 23:40:00 2562.4 2.3 0.207 0.313 -0.810 0.450
285 ABCD 2015-01-10 23:45:00 2562.1 2.3 0.228 0.314 -0.805 0.453
286 ABCD 2015-01-10 23:50:00 2562.2 2.5 0.249 0.331 -0.801 0.467
287 ABCD 2015-01-10 23:55:00 2562.6 2.7 0.253 0.337 -0.796 0.473
[288 rows x 8 columns]
:
In [211]: df.dtypes
Out[211]:
Site object
Epoch datetime64[ns]
TroTot float64
Stdev float64
TgnTot float64
TgnStd float64
TgeTot float64
TgeStd float64
dtype: object
检查:
{{1}}
答案 1 :(得分:1)
对于您的数据格式,最简单的方法是将秒添加到日期。要将日期格式的解析器传递给pandas.read_csv()
,您可以使用以下内容:
<强>代码:强>
import datetime as dt
def date_parser(date_string):
date = dt.datetime.strptime(date_string[:6], '%y:%j')
seconds = dt.timedelta(seconds=int(date_string[7:]))
return date + seconds
测试代码:
from io import StringIO
data = u"""
ABCD 15:010:00000 2564.6 4.0 -0.380 0.417 -1.313 0.520
ABCD 15:010:00300 2564.3 3.7 -0.389 0.396 -1.318 0.503
ABCD 15:010:00600 2563.9 3.5 -0.397 0.389 -1.324 0.496
ABCD 15:010:00900 2563.9 3.3 -0.411 0.368 -1.322 0.476
ABCD 15:010:01200 2563.8 3.0 -0.425 0.361 -1.320 0.466
ABCD 15:010:01500 2563.9 2.8 -0.432 0.340 -1.312 0.447
ABCD 15:010:01800 2564.3 2.6 -0.439 0.334 -1.304 0.439
"""
names=['Site', 'Epoch', 'TroTot', 'Stdev',
'TgnTot', 'TgnStd', 'TgeTot', 'TgeStd']
df = pd.read_csv(StringIO(data),
header=None,
names=names,
delim_whitespace=True,
parse_dates=['Epoch'],
date_parser=date_parser)
print(df)
<强>结果:强>
Site Epoch TroTot Stdev TgnTot TgnStd TgeTot TgeStd
0 ABCD 2015-01-10 00:00:00 2564.6 4.0 -0.380 0.417 -1.313 0.520
1 ABCD 2015-01-10 00:05:00 2564.3 3.7 -0.389 0.396 -1.318 0.503
2 ABCD 2015-01-10 00:10:00 2563.9 3.5 -0.397 0.389 -1.324 0.496
3 ABCD 2015-01-10 00:15:00 2563.9 3.3 -0.411 0.368 -1.322 0.476
4 ABCD 2015-01-10 00:20:00 2563.8 3.0 -0.425 0.361 -1.320 0.466
5 ABCD 2015-01-10 00:25:00 2563.9 2.8 -0.432 0.340 -1.312 0.447
6 ABCD 2015-01-10 00:30:00 2564.3 2.6 -0.439 0.334 -1.304 0.439