Python Pandas Panel4D重新采样

时间:2015-03-04 05:58:26

标签: python pandas

我有一些水文数据,每隔12小时有四个维度。我想使用以下代码计算每日平均值:

>>> InNcFile = Dataset ( InputFile, 'r' )

>>> Time  = InNcFile.variables['time'][:]

>>> Latitude  = InNcFile.variables['lat'][:]

>>> Longitude = InNcFile.variables['lon'][:]

>>> ZLevel = InNcFile.variables['lvl'][:]

>>> SM = InNcFile.variables['sm'][:,:,:,:]

>>> DateTime = map ( lambda x: datetime.strptime ( x, '%Y%m%d%H%M' ), Time )

>>> df = pandas.Panel4D ( SM, labels = DateTime, items = ZLevel, major_axis = Latitude, minor_axis = Longitude )

>>> SM.shape

(21, 4, 769, 1024)

>>> df_SMoist.shape

(21, 4, 769, 1024)

>>> df_MeanSM = df_SMoist.resample ( 'D', how = 'mean', axis = 0 )

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 290, in resample
    return sampler.resample(self)
  File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/tseries/resample.py", line 83, in resample
    rs = self._resample_timestamps(obj)
  File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/tseries/resample.py", line 209, in _resample_timestamps
    grouped = obj.groupby(grouper, axis=self.axis)
  File "/projects/access/apps/pythonlib/pandas/0.12.0/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/core/panelnd.py", line 111, in func
    raise NotImplementedError
NotImplementedError

现在,如果我将SM数组设置为3维,只有一个ZLevel(即使用Panel而不是Panel4D),它可以正常工作。能帮我辨认出我做错了吗?

感谢。

1 个答案:

答案 0 :(得分:1)

Panel4D s(不是?)具有与DataFrames一样的功能丰富的API。你可以 通过将您的4维数据加载到2维中来解决这个问题 DataFrame with a MultiIndex

例如,如果您的SMdateszlevellatitudelongitude看起来如果 像这样:

import numpy as np
import pandas as pd

shape = (5,2,3,4)
SM = np.arange(np.prod(shape)).reshape(shape)
dates = pd.date_range('2000-1-1', periods=shape[0], freq='12H')
zlevel = np.arange(shape[1])
lat = np.arange(shape[2])
lng = np.arange(shape[3])

然后您可以使用MultiIndex构建一个DataFrame,如下所示:

index = pd.MultiIndex.from_product([dates, zlevel, lat, lng])
index.names = ['dates', 'zlevel', 'lat', 'long']
df = pd.DataFrame(SM.ravel(), index=index)

要按日期重新采样,索引需要是DatetimeIndex,TimedeltaIndex或PeriodIndex,而不是MultiIndex。因此,我们需要将zlevellatlong索引级别移到列中:

df = df.unstack(['zlevel', 'lat', 'long'])

现在df看起来像

In [87]: df
Out[87]: 
                      0                                           ...        \
zlevel                0                                           ...     1   
lat                   0                1                   2      ...     0   
long                  0   1   2   3    0    1    2    3    0    1 ...     2   
dates                                                             ...         
2000-01-01 00:00:00   0   1   2   3    4    5    6    7    8    9 ...    14   
2000-01-01 12:00:00  24  25  26  27   28   29   30   31   32   33 ...    38   
2000-01-02 00:00:00  48  49  50  51   52   53   54   55   56   57 ...    62   
2000-01-02 12:00:00  72  73  74  75   76   77   78   79   80   81 ...    86   
2000-01-03 00:00:00  96  97  98  99  100  101  102  103  104  105 ...   110   


zlevel                                                            
lat                         1                   2                 
long                   3    0    1    2    3    0    1    2    3  
dates                                                             
2000-01-01 00:00:00   15   16   17   18   19   20   21   22   23  
2000-01-01 12:00:00   39   40   41   42   43   44   45   46   47  
2000-01-02 00:00:00   63   64   65   66   67   68   69   70   71  
2000-01-02 12:00:00   87   88   89   90   91   92   93   94   95  
2000-01-03 00:00:00  111  112  113  114  115  116  117  118  119  

[5 rows x 24 columns]

现在我们可以重新采样日期:

In [88]: df.resample('D', how='mean', axis=0)
Out[88]: 
             0                                           ...                  \
zlevel       0                                           ...     1             
lat          0                1                   2      ...     0         1   
long         0   1   2   3    0    1    2    3    0    1 ...     2    3    0   
dates                                                    ...                   
2000-01-01  12  13  14  15   16   17   18   19   20   21 ...    26   27   28   
2000-01-02  60  61  62  63   64   65   66   67   68   69 ...    74   75   76   
2000-01-03  96  97  98  99  100  101  102  103  104  105 ...   110  111  112   


zlevel                                         
lat                          2                 
long          1    2    3    0    1    2    3  
dates                                          
2000-01-01   29   30   31   32   33   34   35  
2000-01-02   77   78   79   80   81   82   83  
2000-01-03  113  114  115  116  117  118  119  

[3 rows x 24 columns]