python pandas滚动函数,在分组的DataFrame中有两个参数

时间:2017-01-18 09:35:53

标签: python pandas

这是对我之前问题的一些扩展 python pandas rolling function with two arguments

如何按组执行相同操作?让我们来说说' C'下面的列用于分组。

我正在努力:

  1. 按专栏分组' C'
  2. 在每个组中,按“A'
  3. 排序
  4. 对每个小组,应用一个滚动函数,将两个参数(如kendalltau)用于参数' A'和' B'。
  5. 预期的结果将是如下所示的DataFrame:

    expected result

    我一直在尝试通过索引'上面链接中描述的解决方法,但这种情况的复杂性超出了我的技能:-(。这是一个玩具示例,与我正在使用的不太相似,所以为了简单起见,我使用了随机生成的数据。

    rand = np.random.RandomState(1)
    dff = pd.DataFrame({'A' : np.arange(20),
                        'B' : rand.randint(100, 120, 20),
                        'C' : rand.randint(0, 2, 20)})
    
    def my_tau_indx(indx):
        x = dff.iloc[indx, 0]
        y = dff.iloc[indx, 1]
        tau = sp.stats.mstats.kendalltau(x, y)[0]
        return tau
    
    dff['tau'] = dff.sort_values(['C', 'A']).groupby('C').rolling(window = 5).apply(my_tau_indx, args = ([dff.index.values]))
    

    我所做的每一个修复都会产生另一个错误......

    以上问题已由Nickil Maveli解决,它适用于numpy 1.11.0,pandas 0.18.1,scipy 0.17.1和conda 4.1.4。它会产生一些警告,但有效。

    在我的另一台带有最新版本的机器上最大的numpy 1.12.0,pandas 0.19.2,scipy 0.18.1,conda版本3.10.0和BLAS / LAPACK - 它不起作用,我得到下面的追溯。这似乎版本相关,因为我升级了第一台机器它也停止工作......以科学的名义...; - )

    正如Nickil所说,这是由于numpy 1.11和1.12之间的不兼容。降级numpy有帮助。由于我在Windows上有BLAS / LAPACK,我从http://www.lfd.uci.edu/~gohlke/pythonlibs/安装了numpy 1.11.3 + mkl。

    Traceback (most recent call last):
    
    File "<ipython-input-4-bbca2c0e986b>", line 16, in <module>
    t = grp.apply(func)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 651, in apply
    return self._python_apply_general(f)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 655, in _python_apply_general
    self.axis)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 1527, in apply
    res = f(group)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\groupby.py", line 647, in f
    return func(g, *args, **kwargs)
    
    File "<ipython-input-4-bbca2c0e986b>", line 15, in <lambda>
    func = lambda x: pd.Series(pd.rolling_apply(np.arange(len(x)), 5, my_tau_indx), x.index)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\stats\moments.py", line 584, in rolling_apply
    kwargs=kwargs)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\stats\moments.py", line 240, in ensure_compat
    result = getattr(r, name)(*args, **kwds)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 863, in apply
    return super(Rolling, self).apply(func, args=args, kwargs=kwargs)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 621, in apply
    center=False)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 560, in _apply
    result = calc(values)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 555, in calc
    return func(x, window, min_periods=self.min_periods)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\window.py", line 618, in f
    kwargs)
    
    File "pandas\algos.pyx", line 1831, in pandas.algos.roll_generic (pandas\algos.c:51768)
    
    File "<ipython-input-4-bbca2c0e986b>", line 8, in my_tau_indx
    x = dff.iloc[indx, 0]
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1294, in __getitem__
    return self._getitem_tuple(key)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1560, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=axis)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 1614, in _getitem_axis
    return self._get_loc(key, axis=axis)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\indexing.py", line 96, in _get_loc
    return self.obj._ixs(key, axis=axis)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\core\frame.py", line 1908, in _ixs
    label = self.index[i]
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\indexes\range.py", line 510, in __getitem__
    return super_getitem(key)
    
    File "C:\Apps\Anaconda\v2_1_0_x64\envs\python35\lib\site-packages\pandas\indexes\base.py", line 1275, in __getitem__
    result = getitem(key)
    
    IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
    

    最后检查:

    enter image description here

1 个答案:

答案 0 :(得分:1)

实现的一种方法是遍历每个组并在每个这样的组上使用pd.rolling_apply

import scipy.stats as ss

def my_tau_indx(indx):
    x = dff.iloc[indx, 0]
    y = dff.iloc[indx, 1]
    tau = ss.mstats.kendalltau(x, y)[0]
    return tau

grp = dff.sort_values(['A', 'C']).groupby('C', group_keys=False)
func = lambda x: pd.Series(pd.rolling_apply(np.arange(len(x)), 5, my_tau_indx), x.index)
t = grp.apply(func)
dff.reindex(t.index).assign(tau=t)

enter image description here

<强> 编辑:

def my_tau_indx(indx):
    x = dff.ix[indx, 0]
    y = dff.ix[indx, 1]
    tau = ss.mstats.kendalltau(x, y)[0]
    return tau

grp = dff.sort_values(['A', 'C']).groupby('C', group_keys=False)
t = grp.rolling(5).apply(my_tau_indx).get('A')

grp.head(dff.shape[0]).reindex(t.index).assign(tau=t)

enter image description here