pandas resample不适用于numpy 1.7

时间:2013-05-23 22:17:08

标签: python csv numpy pandas

此代码在我的另一台计算机上使用NumPy 1.6:

import pandas as pd
from pandas import DataFrame
import numpy as np

datapath='C:/Users/Alex/Desktop/samoa/WATERSHED_ANALYSIS/FAGAALU/MasterDataFiles/FP-Master.csv'#):

col_names = ['Date', 'Time', 'TempOut', 'HiTemp', 'LowTemp', 'OutHum', 'DewPt', 'WindSpeed', 'WindDir', 'WindRun', 'HiSpeed', 'HiDir', 'WindChill', 'HeatIndex', 'THWIndex', 'Bar', 'Rain', 'RainRate', 'HeatD-D', 'CoolD-D', 'InTemp', 'InHum', 'InDew', 'InHeat', 'InEMC', 'InAirDensity', 'WindSamp', 'WindTx', 'ISSRecept', 'Arc.Int.']

Wx= pd.read_csv(datapath,skiprows=1,header=0,names=col_names,parse_dates=[['Date','Time']],index_col=['Date_Time'],na_values=['---'])
Wx.index = Wx.index.astype('datetime64')
Wx = Wx.resample('15Min',fill_method='pad',limit=2) ## fill the 30min intervals to 15minute

'Date_Time'是csv文件列'Date''Time'的组合,格式为"%m/%d/%Y %I:%M %p"

在使用NumPy 1.7的新计算机上,我收到此错误:

>>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\core\index.py", line 198, in astype
    return Index(self.values.astype(dtype), name=self.name,
ValueError: Cannot create a NumPy datetime other than NaT with generic units

我尝试过使用Wx.index = pd.to_datetime(Wx.index),但无法将索引转换为DatetimeIndex

我也尝试过使用

Wx.index = Wx['Date_Time'].convert_objects(convert_dates='coerce')

它将索引转换为pandas.tseries.index.DatetimeIndex,然后

Wx.resample('15Min',,fill_method='pad',limit=2) 

给出了这个错误:

  File "tslib.pyx", line 1978, in pandas.tslib.normalize_date (pandas\tslib.c:30569)
ValueError: month must be in 1..12

有谁知道为什么这不起作用?我已尝试使用.asfreq('15Min')然后使用.fillna('pad'),但它很笨重,需要对其他模块进行大量重新编码。

1 个答案:

答案 0 :(得分:1)

解析日期列时会发生一些有趣的事情。我需要查看您的文件(发布链接或您问题的一部分)。您的解析看起来很好。无论如何,pd.to_datetime会将您发布的内容转换为DatetimeIndex,这是重新采样所需的内容。

尝试

Wx.index = pd.todatetime(Wx.index.tolist())

你的索引应该是

In [26]: df.index
Out[26]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-07 10:30:00, ..., 2013-05-04 10:30:00]
Length: 6, Freq: None, Timezone: None

这是示例

In [15]: index = pd.to_datetime('1/7/2012 10:30 AM, 1/7/2012 11:00 AM, 1/7/2012 11:30 AM, 5/4/2013 10:00 AM, 5/4/2013 10:15 AM, 5/4/2013 10:30 AM'.split(', '))

In [16]: df = DataFrame(randn(6,2),index=index)

In [17]: df
Out[17]: 
                            0         1
2012-01-07 10:30:00  0.523777 -0.093911
2012-01-07 11:00:00  0.954344  0.830551
2012-01-07 11:30:00 -0.004064 -1.831855
2013-05-04 10:00:00 -1.082163  1.426966
2013-05-04 10:15:00 -1.025252 -0.169916
2013-05-04 10:30:00  1.717222 -0.988228

In [18]: df.resample('15Min',fill_method='pad',limit=2).head(10)
Out[18]: 
                            0         1
2012-01-07 10:30:00  0.523777 -0.093911
2012-01-07 10:45:00  0.523777 -0.093911
2012-01-07 11:00:00  0.954344  0.830551
2012-01-07 11:15:00  0.954344  0.830551
2012-01-07 11:30:00 -0.004064 -1.831855
2012-01-07 11:45:00 -0.004064 -1.831855
2012-01-07 12:00:00 -0.004064 -1.831855
2012-01-07 12:15:00       NaN       NaN
2012-01-07 12:30:00       NaN       NaN
2012-01-07 12:45:00       NaN       NaN

In [19]: np.__version__
Out[19]: '1.7.1'

这是我解析的示例文件(就像你一样)

In [32]: pd.read_csv('foo.csv',index_col=['Date_Time'],parse_dates=[['Date','Time']])
Out[32]: 
                            0         1
Date_Time                              
2012-01-07 10:30:00  0.523777 -0.093911
2012-01-07 11:00:00  0.954344  0.830551
2012-01-07 11:30:00 -0.004064 -1.831855
2013-05-04 10:00:00 -1.082163  1.426966
2013-05-04 10:15:00 -1.025252 -0.169916
2013-05-04 10:30:00  1.717222 -0.988228

In [33]: !cat 'foo.csv'
Date,Time,0,1
2012-01-07,10:30:00 AM,0.5237774067993367,-0.0939112810613334
2012-01-07,11:00:00 AM,0.9543438182818779,0.8305511332193324
2012-01-07,11:30:00 AM,-0.004064420703945425,-1.8318551051738328
2013-05-04,10:00:00 AM,-1.082162936479846,1.4269663822610816
2013-05-04,10:15:00 AM,-1.0252522955053849,-0.16991623915937284
2013-05-04,10:30:00 AM,1.7172224344229594,-0.9882282095859544

也许某些事情未在您的档案中对齐,或者您在日期/时间字段中嵌入了一些奇怪的字符?