date_parser失败:对于基数为10的int()的无效文字:'month'

时间:2016-07-18 15:54:03

标签: python-2.7 pandas int date-parsing

我意识到这个错误通常是由于浮点数或小数传递给int()函数;但是,我不明白它现在如何应用于我的程序。我正在使用一个名为dateparse的函数来解析通用文件格式的日期,以便生成遵循date_time格式的csv文件。我有以下代码:

names=dates['Year','Month','Day','Hour','Minute',\
'T2 (WRF)', 'TSK (WRF)', 'LH (WRF)','HS (WRF)', 'Q2 (WRF)',\
'U10 (WRF)','V10 (WRF)','PSFC (WRF)','ZNT (WRF)','SWDOWN (WRF)','RAINNC (WRF)']

def dateparse(Y, m, d, H, M):
    a = pd.datetime(int(Y), int(m), int(d), int(H), int(M))
    return(a)

df=pd.read_csv(filein,delim_whitespace=True,\
header=None, names=names, parse_dates={'date_time': dates},\
date_parser=dateparse, index_col='date_time')
df.to_csv(fileout+'.csv', index=True)

我的数据看起来像这样,所有格式都相同:

2014 06 28 12 00 298.406 296.388 8.60505e-05 -11.8442 0.00890335 -0.125414 -0.681333 96967.9 0.79537 0 0
2014 06 28 13 00 298.6 296.854 9.5607e-05 -10.5284 0.00823525 -1.04711 -0.317631 97030 0.79537 19.3502 0
2014 06 28 14 00 301.66 303.488 0.000109433 30.6269 0.00898107 0.000669297 -1.06901 97086.2 0.79537 213.257 0
2014 06 28 15 00 302.186 307.853 0.000169239 128.347 0.00898755 0.993213 -1.16031 97081.7 0.79537 433.372 0
2014 06 28 16 00 303.145 312.31 0.000230749 219.192 0.00874303 0.644703 -0.80952 97137.6 0.79537 639.32 0

所以标题应该是年,月,日,小时,分钟。所有这些都是整数,我不明白为什么这会抛出一个错误。我以前使用非常相似的数据使用此程序,时间格式相同。

1 个答案:

答案 0 :(得分:0)

零前导字符串无法转换为int,如果您执行以下操作,那么它将起作用:

In [36]:

import io
t="""2014 06 28 12 00 298.406 296.388 8.60505e-05 -11.8442 0.00890335 -0.125414 -0.681333 96967.9 0.79537 0 0
2014 06 28 13 00 298.6 296.854 9.5607e-05 -10.5284 0.00823525 -1.04711 -0.317631 97030 0.79537 19.3502 0
2014 06 28 14 00 301.66 303.488 0.000109433 30.6269 0.00898107 0.000669297 -1.06901 97086.2 0.79537 213.257 0
2014 06 28 15 00 302.186 307.853 0.000169239 128.347 0.00898755 0.993213 -1.16031 97081.7 0.79537 433.372 0
2014 06 28 16 00 303.145 312.31 0.000230749 219.192 0.00874303 0.644703 -0.80952 97137.6 0.79537 639.32 0"""
df = pd.read_csv(io.StringIO(t), delim_whitespace=True, parse_dates=[['Year','Month','Day','Hour','Minute']], names = ['Year','Month','Day','Hour','Minute',\
'T2 (WRF)', 'TSK (WRF)', 'LH (WRF)','HS (WRF)', 'Q2 (WRF)',\
'U10 (WRF)','V10 (WRF)','PSFC (WRF)','ZNT (WRF)','SWDOWN (WRF)','RAINNC (WRF)'])
df

Out[36]:
  Year_Month_Day_Hour_Minute  T2 (WRF)  TSK (WRF)  LH (WRF)  HS (WRF)  \
0           2014 06 28 12 00   298.406    296.388  0.000086  -11.8442   
1           2014 06 28 13 00   298.600    296.854  0.000096  -10.5284   
2           2014 06 28 14 00   301.660    303.488  0.000109   30.6269   
3           2014 06 28 15 00   302.186    307.853  0.000169  128.3470   
4           2014 06 28 16 00   303.145    312.310  0.000231  219.1920   

   Q2 (WRF)  U10 (WRF)  V10 (WRF)  PSFC (WRF)  ZNT (WRF)  SWDOWN (WRF)  \
0  0.008903  -0.125414  -0.681333     96967.9    0.79537        0.0000   
1  0.008235  -1.047110  -0.317631     97030.0    0.79537       19.3502   
2  0.008981   0.000669  -1.069010     97086.2    0.79537      213.2570   
3  0.008988   0.993213  -1.160310     97081.7    0.79537      433.3720   
4  0.008743   0.644703  -0.809520     97137.6    0.79537      639.3200   

   RAINNC (WRF)  
0             0  
1             0  
2             0  
3             0  
4             0  
In [39]:
df['Date'] = pd.to_datetime(df['Year_Month_Day_Hour_Minute'], format='%Y %m %d %H %M' )
df

Out[39]:
  Year_Month_Day_Hour_Minute  T2 (WRF)  TSK (WRF)  LH (WRF)  HS (WRF)  \
0           2014 06 28 12 00   298.406    296.388  0.000086  -11.8442   
1           2014 06 28 13 00   298.600    296.854  0.000096  -10.5284   
2           2014 06 28 14 00   301.660    303.488  0.000109   30.6269   
3           2014 06 28 15 00   302.186    307.853  0.000169  128.3470   
4           2014 06 28 16 00   303.145    312.310  0.000231  219.1920   

   Q2 (WRF)  U10 (WRF)  V10 (WRF)  PSFC (WRF)  ZNT (WRF)  SWDOWN (WRF)  \
0  0.008903  -0.125414  -0.681333     96967.9    0.79537        0.0000   
1  0.008235  -1.047110  -0.317631     97030.0    0.79537       19.3502   
2  0.008981   0.000669  -1.069010     97086.2    0.79537      213.2570   
3  0.008988   0.993213  -1.160310     97081.7    0.79537      433.3720   
4  0.008743   0.644703  -0.809520     97137.6    0.79537      639.3200   

   RAINNC (WRF)                Date  
0             0 2014-06-28 12:00:00  
1             0 2014-06-28 13:00:00  
2             0 2014-06-28 14:00:00  
3             0 2014-06-28 15:00:00  
4             0 2014-06-28 16:00:00 

所以在这里我将年月日小时和分钟视为一个列,然后我调用to_datetime并传递format = '%Y %m %d %H %M',您可以看到这正确解析:

In [40]:

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 13 columns):
Year_Month_Day_Hour_Minute    5 non-null object
T2 (WRF)                      5 non-null float64
TSK (WRF)                     5 non-null float64
LH (WRF)                      5 non-null float64
HS (WRF)                      5 non-null float64
Q2 (WRF)                      5 non-null float64
U10 (WRF)                     5 non-null float64
V10 (WRF)                     5 non-null float64
PSFC (WRF)                    5 non-null float64
ZNT (WRF)                     5 non-null float64
SWDOWN (WRF)                  5 non-null float64
RAINNC (WRF)                  5 non-null int64
Date                          5 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(10), int64(1), object(1)
memory usage: 600.0+ bytes