当hour = 24时处理csv datetime解析 - pandas

时间:2016-10-13 16:24:14

标签: csv parsing datetime pandas

我正在尝试从更大的csv文件导入时间序列,通过指向特定的列,这里提取。列没有标题,因此我将其与df_time.columns = ['Year','Month','Day','Hour']叠加。

2030    1   1   1   2.4
2030    1   1   2   2.1
2030    1   1   3   1.7
2030    1   1   4   1
2030    1   1   5   0.9
2030    1   1   6   1.5
2030    1   1   7   1.1
2030    1   1   8   0.6
2030    1   1   9   1.4
2030    1   1   10  2.2
2030    1   1   11  2
2030    1   1   12  3
2030    1   1   13  2.4
2030    1   1   14  2.6
2030    1   1   15  3.1
2030    1   1   16  2.6
2030    1   1   17  1.9
2030    1   1   18  1.9
2030    1   1   19  2.6
2030    1   1   20  1.7
2030    1   1   21  1.1
2030    1   1   22  1.3
2030    1   1   23  1.4
2030    1   1   24  1.7
2030    1   2   1   2.1

我的脚本在0-23小时工作正常,如下:

def my_import(f):
df_time = pd.read_csv(f, skiprows=8, usecols=[0,1,2,3])
df_time = df_time.astype(int)
df_time.columns = ['Year','Month','Day','Hour']
df_time['period'] = df_time.apply(lambda x : str(int(x['Year']))
+str(int(x['Month'])).zfill(2)
+str(int(x['Day'])).zfill(2)
+' '+str(int(x['Hour'])/100).zfill(2), axis = 1)
df_time.loc[:, 'Date'] = pd.to_datetime(df_time['period'], format = '%Y/%m/%d %H')
df_time.drop(['Year', 'Month', 'Day', 'Hour', 'period'], axis = 1, inplace = True)
df_DBT = pd.read_csv(f, skiprows=8, usecols=[6])
df = pd.concat([df_time,df_DBT], axis = 1)
df = df.set_index(['Date'])
return df

问题出现在24,大熊猫不承认。我可以很容易地用0取代24,但挑战有一天会增加。

如果我在日期时间解析之前向列值添加+1,则每隔31天变为第32天 - 产生更多错误。 我已经尝试修改脚本,将to_datetime命令分别强加给日期和时间,但没有运气。

这非常令人沮丧!

2 个答案:

答案 0 :(得分:3)

请不要低估熊猫的力量!

演示(使用Pandas 0.19.0):

数据:

In [33]: df
Out[33]:
    Year  Month  Day  Hour  Val
0   2030      1    1     1  2.4
1   2030      1    1     2  2.1
2   2030      1    1     3  1.7
3   2030      1    1     4  1.0
4   2030      1    1     5  0.9
5   2030      1    1     6  1.5
6   2030      1    1     7  1.1
7   2030      1    1     8  0.6
8   2030      1    1     9  1.4
9   2030      1    1    10  2.2
10  2030      1    1    11  2.0
11  2030      1    1    12  3.0
12  2030      1    1    13  2.4
13  2030      1    1    14  2.6
14  2030      1    1    15  3.1
15  2030      1    1    16  2.6
16  2030      1    1    17  1.9
17  2030      1    1    18  1.9
18  2030      1    1    19  2.6
19  2030      1    1    20  1.7
20  2030      1    1    21  1.1
21  2030      1    1    22  1.3
22  2030      1    1    23  1.4
23  2030      1    1    24  1.7    # <-----------
24  2030      1    2     1  2.1

解决方案:

In [34]: pd.to_datetime(df[['Year', 'Month', 'Day', 'Hour']])
Out[34]:
0    2030-01-01 01:00:00
1    2030-01-01 02:00:00
2    2030-01-01 03:00:00
3    2030-01-01 04:00:00
4    2030-01-01 05:00:00
5    2030-01-01 06:00:00
6    2030-01-01 07:00:00
7    2030-01-01 08:00:00
8    2030-01-01 09:00:00
9    2030-01-01 10:00:00
10   2030-01-01 11:00:00
11   2030-01-01 12:00:00
12   2030-01-01 13:00:00
13   2030-01-01 14:00:00
14   2030-01-01 15:00:00
15   2030-01-01 16:00:00
16   2030-01-01 17:00:00
17   2030-01-01 18:00:00
18   2030-01-01 19:00:00
19   2030-01-01 20:00:00
20   2030-01-01 21:00:00
21   2030-01-01 22:00:00
22   2030-01-01 23:00:00
23   2030-01-02 00:00:00    # <-----------
24   2030-01-02 01:00:00
dtype: datetime64[ns]

答案 1 :(得分:0)

在日期时间解析代码之前执行此操作:

df_time['Day'] = np.where(df_time.Hour == 24, df_time.Day+1, df_time.Day)
df_time['Hour'] = np.where(df_time.Hour == 24, 0, df_time.Hour)

df_time['Month'] = np.where((df_time.Day > 31) & (df_time.Month.isin([1, 3, 5, 7,8, 10, 12])), df_time.Month+1, df_time.Month)
df_time['Day'] = np.where((df_time.Day > 31) & (df_time.Month.isin([1, 3, 5, 7,8, 10, 12])), 1, df_time.Day)

df_time['Month'] = np.where((df_time.Day > 30) & (df_time.Month.isin([4, 6, 9, 11])), df_time.Month+1, df_time.Month)
df_time['Day'] = np.where((df_time.Day > 30) & (df_time.Month.isin([4, 6, 9, 11])), 1, df_time.Day)

df_time['Month'] = np.where((df_time.Day > 28) & (df_time.Month == 2)), df_time.Month+1, df_time.Month)
df_time['Day'] = np.where((df_time.Day > 28) & (df_time.Month == 2)), 1, df_time.Day)

df_time['Year'] = np.where(df_time.Month > 12, df_time.Year+1, df_time.Year)
df_time['Month'] = np.where(df_time.Year> 12, 1, df_time.Month)