我正在尝试从更大的csv文件导入时间序列,通过指向特定的列,这里提取。列没有标题,因此我将其与df_time.columns = ['Year','Month','Day','Hour']
叠加。
2030 1 1 1 2.4
2030 1 1 2 2.1
2030 1 1 3 1.7
2030 1 1 4 1
2030 1 1 5 0.9
2030 1 1 6 1.5
2030 1 1 7 1.1
2030 1 1 8 0.6
2030 1 1 9 1.4
2030 1 1 10 2.2
2030 1 1 11 2
2030 1 1 12 3
2030 1 1 13 2.4
2030 1 1 14 2.6
2030 1 1 15 3.1
2030 1 1 16 2.6
2030 1 1 17 1.9
2030 1 1 18 1.9
2030 1 1 19 2.6
2030 1 1 20 1.7
2030 1 1 21 1.1
2030 1 1 22 1.3
2030 1 1 23 1.4
2030 1 1 24 1.7
2030 1 2 1 2.1
我的脚本在0-23小时工作正常,如下:
def my_import(f):
df_time = pd.read_csv(f, skiprows=8, usecols=[0,1,2,3])
df_time = df_time.astype(int)
df_time.columns = ['Year','Month','Day','Hour']
df_time['period'] = df_time.apply(lambda x : str(int(x['Year']))
+str(int(x['Month'])).zfill(2)
+str(int(x['Day'])).zfill(2)
+' '+str(int(x['Hour'])/100).zfill(2), axis = 1)
df_time.loc[:, 'Date'] = pd.to_datetime(df_time['period'], format = '%Y/%m/%d %H')
df_time.drop(['Year', 'Month', 'Day', 'Hour', 'period'], axis = 1, inplace = True)
df_DBT = pd.read_csv(f, skiprows=8, usecols=[6])
df = pd.concat([df_time,df_DBT], axis = 1)
df = df.set_index(['Date'])
return df
问题出现在24,大熊猫不承认。我可以很容易地用0取代24,但挑战有一天会增加。
如果我在日期时间解析之前向列值添加+1,则每隔31天变为第32天 - 产生更多错误。
我已经尝试修改脚本,将to_datetime
命令分别强加给日期和时间,但没有运气。
这非常令人沮丧!
答案 0 :(得分:3)
请不要低估熊猫的力量!
演示(使用Pandas 0.19.0):
数据:
In [33]: df
Out[33]:
Year Month Day Hour Val
0 2030 1 1 1 2.4
1 2030 1 1 2 2.1
2 2030 1 1 3 1.7
3 2030 1 1 4 1.0
4 2030 1 1 5 0.9
5 2030 1 1 6 1.5
6 2030 1 1 7 1.1
7 2030 1 1 8 0.6
8 2030 1 1 9 1.4
9 2030 1 1 10 2.2
10 2030 1 1 11 2.0
11 2030 1 1 12 3.0
12 2030 1 1 13 2.4
13 2030 1 1 14 2.6
14 2030 1 1 15 3.1
15 2030 1 1 16 2.6
16 2030 1 1 17 1.9
17 2030 1 1 18 1.9
18 2030 1 1 19 2.6
19 2030 1 1 20 1.7
20 2030 1 1 21 1.1
21 2030 1 1 22 1.3
22 2030 1 1 23 1.4
23 2030 1 1 24 1.7 # <-----------
24 2030 1 2 1 2.1
解决方案:
In [34]: pd.to_datetime(df[['Year', 'Month', 'Day', 'Hour']])
Out[34]:
0 2030-01-01 01:00:00
1 2030-01-01 02:00:00
2 2030-01-01 03:00:00
3 2030-01-01 04:00:00
4 2030-01-01 05:00:00
5 2030-01-01 06:00:00
6 2030-01-01 07:00:00
7 2030-01-01 08:00:00
8 2030-01-01 09:00:00
9 2030-01-01 10:00:00
10 2030-01-01 11:00:00
11 2030-01-01 12:00:00
12 2030-01-01 13:00:00
13 2030-01-01 14:00:00
14 2030-01-01 15:00:00
15 2030-01-01 16:00:00
16 2030-01-01 17:00:00
17 2030-01-01 18:00:00
18 2030-01-01 19:00:00
19 2030-01-01 20:00:00
20 2030-01-01 21:00:00
21 2030-01-01 22:00:00
22 2030-01-01 23:00:00
23 2030-01-02 00:00:00 # <-----------
24 2030-01-02 01:00:00
dtype: datetime64[ns]
答案 1 :(得分:0)
在日期时间解析代码之前执行此操作:
df_time['Day'] = np.where(df_time.Hour == 24, df_time.Day+1, df_time.Day)
df_time['Hour'] = np.where(df_time.Hour == 24, 0, df_time.Hour)
df_time['Month'] = np.where((df_time.Day > 31) & (df_time.Month.isin([1, 3, 5, 7,8, 10, 12])), df_time.Month+1, df_time.Month)
df_time['Day'] = np.where((df_time.Day > 31) & (df_time.Month.isin([1, 3, 5, 7,8, 10, 12])), 1, df_time.Day)
df_time['Month'] = np.where((df_time.Day > 30) & (df_time.Month.isin([4, 6, 9, 11])), df_time.Month+1, df_time.Month)
df_time['Day'] = np.where((df_time.Day > 30) & (df_time.Month.isin([4, 6, 9, 11])), 1, df_time.Day)
df_time['Month'] = np.where((df_time.Day > 28) & (df_time.Month == 2)), df_time.Month+1, df_time.Month)
df_time['Day'] = np.where((df_time.Day > 28) & (df_time.Month == 2)), 1, df_time.Day)
df_time['Year'] = np.where(df_time.Month > 12, df_time.Year+1, df_time.Year)
df_time['Month'] = np.where(df_time.Year> 12, 1, df_time.Month)