我意识到这个错误通常是由于浮点数或小数传递给int()
函数;但是,我不明白它现在如何应用于我的程序。我正在使用一个名为dateparse
的函数来解析通用文件格式的日期,以便生成遵循date_time格式的csv文件。我有以下代码:
names=dates['Year','Month','Day','Hour','Minute',\
'T2 (WRF)', 'TSK (WRF)', 'LH (WRF)','HS (WRF)', 'Q2 (WRF)',\
'U10 (WRF)','V10 (WRF)','PSFC (WRF)','ZNT (WRF)','SWDOWN (WRF)','RAINNC (WRF)']
def dateparse(Y, m, d, H, M):
a = pd.datetime(int(Y), int(m), int(d), int(H), int(M))
return(a)
df=pd.read_csv(filein,delim_whitespace=True,\
header=None, names=names, parse_dates={'date_time': dates},\
date_parser=dateparse, index_col='date_time')
df.to_csv(fileout+'.csv', index=True)
我的数据看起来像这样,所有格式都相同:
2014 06 28 12 00 298.406 296.388 8.60505e-05 -11.8442 0.00890335 -0.125414 -0.681333 96967.9 0.79537 0 0
2014 06 28 13 00 298.6 296.854 9.5607e-05 -10.5284 0.00823525 -1.04711 -0.317631 97030 0.79537 19.3502 0
2014 06 28 14 00 301.66 303.488 0.000109433 30.6269 0.00898107 0.000669297 -1.06901 97086.2 0.79537 213.257 0
2014 06 28 15 00 302.186 307.853 0.000169239 128.347 0.00898755 0.993213 -1.16031 97081.7 0.79537 433.372 0
2014 06 28 16 00 303.145 312.31 0.000230749 219.192 0.00874303 0.644703 -0.80952 97137.6 0.79537 639.32 0
所以标题应该是年,月,日,小时,分钟。所有这些都是整数,我不明白为什么这会抛出一个错误。我以前使用非常相似的数据使用此程序,时间格式相同。
答案 0 :(得分:0)
零前导字符串无法转换为int
,如果您执行以下操作,那么它将起作用:
In [36]:
import io
t="""2014 06 28 12 00 298.406 296.388 8.60505e-05 -11.8442 0.00890335 -0.125414 -0.681333 96967.9 0.79537 0 0
2014 06 28 13 00 298.6 296.854 9.5607e-05 -10.5284 0.00823525 -1.04711 -0.317631 97030 0.79537 19.3502 0
2014 06 28 14 00 301.66 303.488 0.000109433 30.6269 0.00898107 0.000669297 -1.06901 97086.2 0.79537 213.257 0
2014 06 28 15 00 302.186 307.853 0.000169239 128.347 0.00898755 0.993213 -1.16031 97081.7 0.79537 433.372 0
2014 06 28 16 00 303.145 312.31 0.000230749 219.192 0.00874303 0.644703 -0.80952 97137.6 0.79537 639.32 0"""
df = pd.read_csv(io.StringIO(t), delim_whitespace=True, parse_dates=[['Year','Month','Day','Hour','Minute']], names = ['Year','Month','Day','Hour','Minute',\
'T2 (WRF)', 'TSK (WRF)', 'LH (WRF)','HS (WRF)', 'Q2 (WRF)',\
'U10 (WRF)','V10 (WRF)','PSFC (WRF)','ZNT (WRF)','SWDOWN (WRF)','RAINNC (WRF)'])
df
Out[36]:
Year_Month_Day_Hour_Minute T2 (WRF) TSK (WRF) LH (WRF) HS (WRF) \
0 2014 06 28 12 00 298.406 296.388 0.000086 -11.8442
1 2014 06 28 13 00 298.600 296.854 0.000096 -10.5284
2 2014 06 28 14 00 301.660 303.488 0.000109 30.6269
3 2014 06 28 15 00 302.186 307.853 0.000169 128.3470
4 2014 06 28 16 00 303.145 312.310 0.000231 219.1920
Q2 (WRF) U10 (WRF) V10 (WRF) PSFC (WRF) ZNT (WRF) SWDOWN (WRF) \
0 0.008903 -0.125414 -0.681333 96967.9 0.79537 0.0000
1 0.008235 -1.047110 -0.317631 97030.0 0.79537 19.3502
2 0.008981 0.000669 -1.069010 97086.2 0.79537 213.2570
3 0.008988 0.993213 -1.160310 97081.7 0.79537 433.3720
4 0.008743 0.644703 -0.809520 97137.6 0.79537 639.3200
RAINNC (WRF)
0 0
1 0
2 0
3 0
4 0
In [39]:
df['Date'] = pd.to_datetime(df['Year_Month_Day_Hour_Minute'], format='%Y %m %d %H %M' )
df
Out[39]:
Year_Month_Day_Hour_Minute T2 (WRF) TSK (WRF) LH (WRF) HS (WRF) \
0 2014 06 28 12 00 298.406 296.388 0.000086 -11.8442
1 2014 06 28 13 00 298.600 296.854 0.000096 -10.5284
2 2014 06 28 14 00 301.660 303.488 0.000109 30.6269
3 2014 06 28 15 00 302.186 307.853 0.000169 128.3470
4 2014 06 28 16 00 303.145 312.310 0.000231 219.1920
Q2 (WRF) U10 (WRF) V10 (WRF) PSFC (WRF) ZNT (WRF) SWDOWN (WRF) \
0 0.008903 -0.125414 -0.681333 96967.9 0.79537 0.0000
1 0.008235 -1.047110 -0.317631 97030.0 0.79537 19.3502
2 0.008981 0.000669 -1.069010 97086.2 0.79537 213.2570
3 0.008988 0.993213 -1.160310 97081.7 0.79537 433.3720
4 0.008743 0.644703 -0.809520 97137.6 0.79537 639.3200
RAINNC (WRF) Date
0 0 2014-06-28 12:00:00
1 0 2014-06-28 13:00:00
2 0 2014-06-28 14:00:00
3 0 2014-06-28 15:00:00
4 0 2014-06-28 16:00:00
所以在这里我将年月日小时和分钟视为一个列,然后我调用to_datetime
并传递format = '%Y %m %d %H %M'
,您可以看到这正确解析:
In [40]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 13 columns):
Year_Month_Day_Hour_Minute 5 non-null object
T2 (WRF) 5 non-null float64
TSK (WRF) 5 non-null float64
LH (WRF) 5 non-null float64
HS (WRF) 5 non-null float64
Q2 (WRF) 5 non-null float64
U10 (WRF) 5 non-null float64
V10 (WRF) 5 non-null float64
PSFC (WRF) 5 non-null float64
ZNT (WRF) 5 non-null float64
SWDOWN (WRF) 5 non-null float64
RAINNC (WRF) 5 non-null int64
Date 5 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(10), int64(1), object(1)
memory usage: 600.0+ bytes