Pandas date_parser函数为year:doy:sod格式

时间:2017-03-11 15:29:51

标签: python csv pandas datetime

我有一张表格,其中日期和时间以year:day_of_year:seconds_of_day格式(第2栏)给出。可以找到示例文件here

ABCD 15:010:00000 2564.6   4.0  -0.380  0.417  -1.313  0.520
ABCD 15:010:00300 2564.3   3.7  -0.389  0.396  -1.318  0.503
ABCD 15:010:00600 2563.9   3.5  -0.397  0.389  -1.324  0.496
ABCD 15:010:00900 2563.9   3.3  -0.411  0.368  -1.322  0.476
ABCD 15:010:01200 2563.8   3.0  -0.425  0.361  -1.320  0.466
ABCD 15:010:01500 2563.9   2.8  -0.432  0.340  -1.312  0.447
ABCD 15:010:01800 2564.3   2.6  -0.439  0.334  -1.304  0.439

我使用pandas包将上表放在pandas.DataFrame

names=['Site', 'Epoch', 'TroTot', 'Stdev','TgnTot', 'TgnStd', 'TgeTot', 'TgeStd']
parser = lambda x: pd.datetime.strptime(x, '%y:%j:%???')
df = pd.read_csv([FILE][1], 
                 header=None,
                 names=names,
                 delim_whitespace=True,
                 parse_dates=['Epoch'],
                 date_parser=parser)

我查看了文档,但在我看来,一天中没有格式。如何使用此格式更改解析器功能?

2 个答案:

答案 0 :(得分:3)

您可以先按原样读取您的CSV文件(不解析日期):

Epoch

现在您可以按如下方式解析In [209]: df['Epoch'] = pd.to_datetime(df['Epoch'].str[:6], format='%y:%j') + \ ...: pd.to_timedelta(df['Epoch'].str[7:].astype(int), unit='s') ...: In [210]: df Out[210]: Site Epoch TroTot Stdev TgnTot TgnStd TgeTot TgeStd 0 ABCD 2015-01-10 00:00:00 2564.6 4.0 -0.380 0.417 -1.313 0.520 1 ABCD 2015-01-10 00:05:00 2564.3 3.7 -0.389 0.396 -1.318 0.503 2 ABCD 2015-01-10 00:10:00 2563.9 3.5 -0.397 0.389 -1.324 0.496 3 ABCD 2015-01-10 00:15:00 2563.9 3.3 -0.411 0.368 -1.322 0.476 4 ABCD 2015-01-10 00:20:00 2563.8 3.0 -0.425 0.361 -1.320 0.466 5 ABCD 2015-01-10 00:25:00 2563.9 2.8 -0.432 0.340 -1.312 0.447 6 ABCD 2015-01-10 00:30:00 2564.3 2.6 -0.439 0.334 -1.304 0.439 7 ABCD 2015-01-10 00:35:00 2564.5 2.5 -0.453 0.314 -1.302 0.423 8 ABCD 2015-01-10 00:40:00 2564.2 2.4 -0.467 0.309 -1.299 0.419 9 ABCD 2015-01-10 00:45:00 2563.7 2.3 -0.482 0.287 -1.305 0.404 .. ... ... ... ... ... ... ... ... 278 ABCD 2015-01-10 23:10:00 2561.6 2.2 0.033 0.276 -0.894 0.416 279 ABCD 2015-01-10 23:15:00 2562.1 2.2 0.053 0.271 -0.897 0.418 280 ABCD 2015-01-10 23:20:00 2562.7 2.3 0.073 0.285 -0.899 0.431 281 ABCD 2015-01-10 23:25:00 2562.6 2.3 0.108 0.283 -0.869 0.431 282 ABCD 2015-01-10 23:30:00 2562.7 2.3 0.144 0.299 -0.839 0.442 283 ABCD 2015-01-10 23:35:00 2562.4 2.3 0.175 0.298 -0.824 0.441 284 ABCD 2015-01-10 23:40:00 2562.4 2.3 0.207 0.313 -0.810 0.450 285 ABCD 2015-01-10 23:45:00 2562.1 2.3 0.228 0.314 -0.805 0.453 286 ABCD 2015-01-10 23:50:00 2562.2 2.5 0.249 0.331 -0.801 0.467 287 ABCD 2015-01-10 23:55:00 2562.6 2.7 0.253 0.337 -0.796 0.473 [288 rows x 8 columns]

In [211]: df.dtypes
Out[211]:
Site              object
Epoch     datetime64[ns]
TroTot           float64
Stdev            float64
TgnTot           float64
TgnStd           float64
TgeTot           float64
TgeStd           float64
dtype: object

检查:

{{1}}

答案 1 :(得分:1)

对于您的数据格式,最简单的方法是将秒添加到日期。要将日期格式的解析器传递给pandas.read_csv(),您可以使用以下内容:

<强>代码:

import datetime as dt
def date_parser(date_string):
    date = dt.datetime.strptime(date_string[:6], '%y:%j')
    seconds = dt.timedelta(seconds=int(date_string[7:]))
    return date + seconds

测试代码:

from io import StringIO

data = u"""
ABCD 15:010:00000 2564.6   4.0  -0.380  0.417  -1.313  0.520
ABCD 15:010:00300 2564.3   3.7  -0.389  0.396  -1.318  0.503
ABCD 15:010:00600 2563.9   3.5  -0.397  0.389  -1.324  0.496
ABCD 15:010:00900 2563.9   3.3  -0.411  0.368  -1.322  0.476
ABCD 15:010:01200 2563.8   3.0  -0.425  0.361  -1.320  0.466
ABCD 15:010:01500 2563.9   2.8  -0.432  0.340  -1.312  0.447
ABCD 15:010:01800 2564.3   2.6  -0.439  0.334  -1.304  0.439
"""

names=['Site', 'Epoch', 'TroTot', 'Stdev',
       'TgnTot', 'TgnStd', 'TgeTot', 'TgeStd']

df = pd.read_csv(StringIO(data),
                 header=None,
                 names=names,
                 delim_whitespace=True,
                 parse_dates=['Epoch'],
                 date_parser=date_parser)
print(df)

<强>结果:

   Site               Epoch  TroTot  Stdev  TgnTot  TgnStd  TgeTot  TgeStd
0  ABCD 2015-01-10 00:00:00  2564.6    4.0  -0.380   0.417  -1.313   0.520
1  ABCD 2015-01-10 00:05:00  2564.3    3.7  -0.389   0.396  -1.318   0.503
2  ABCD 2015-01-10 00:10:00  2563.9    3.5  -0.397   0.389  -1.324   0.496
3  ABCD 2015-01-10 00:15:00  2563.9    3.3  -0.411   0.368  -1.322   0.476
4  ABCD 2015-01-10 00:20:00  2563.8    3.0  -0.425   0.361  -1.320   0.466
5  ABCD 2015-01-10 00:25:00  2563.9    2.8  -0.432   0.340  -1.312   0.447
6  ABCD 2015-01-10 00:30:00  2564.3    2.6  -0.439   0.334  -1.304   0.439