datetime.date使用set_index,groupby和在Pandas 0.8.1中应用创建了许多问题

时间:2013-10-25 14:25:39

标签: python pandas datetime pandas-groupby

我在一个因官僚主义原因无法升级的环境中使用Pandas 0.8.1。

在阅读有关初始问题和目标的所有内容之前,您可能需要跳到下面的“简化问题”部分。

我的目标:按分类列“D”对DataFrame进行分组,然后对每个组按日期列“dt”排序,将索引设置为“dt”,执行滚动OLS回归,并返回DataFrame按日期索引的回归系数beta

最终结果可能是一堆堆叠的beta帧,每个帧对于某些特定的分类变量都是唯一的,因此最终索引将是两个级别,一个用于类别ID,一个用于日期。

如果我做的话

my_dataframe.groupby("D").apply(some_wrapped_OLS_caller)

然后我经常会遇到令人沮丧的无法提供信息KeyError: 0错误,并且回溯似乎在日期时间问题上窒息:

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
   2287             arrays.append(level)
   2288
-> 2289         index = MultiIndex.from_arrays(arrays, names=keys)
   2290
   2291         if verify_integrity and not index.is_unique:

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
   1505         if len(arrays) == 1:
   1506             name = None if names is None else names[0]
-> 1507             return Index(arrays[0], name=name)
   1508
   1509         cats = [Categorical.from_array(arr) for arr in arrays]

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
    102         if dtype is None:
    103             if (lib.is_datetime_array(subarr)
--> 104                 or lib.is_datetime64_array(subarr)
    105                 or lib.is_timestamp_array(subarr)):
    106                 from pandas.tseries.index import DatetimeIndex

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    427     def __getitem__(self, key):
    428         try:
--> 429             return self.index.get_value(self, key)
    430         except InvalidIndexError:
    431             pass

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
    639         """
    640         try:
--> 641             return self._engine.get_value(series, key)
    642         except KeyError, e1:
    643             if len(self) > 0 and self.inferred_type == 'integer':

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()

KeyError: 0

如果我在逐个对象的每个组上手动执行回归步骤,那么一切都可以顺利进行。

代码:

import numpy as np
import pandas
import datetime
from dateutil.relativedelta import relativedelta as drr

def foo(zz):
    zz1 = zz.sort("dt", ascending=True).set_index("dt")
    r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12)
    return r1.beta

dfrm_test = pandas.DataFrame({"x":np.random.rand(731), 
                              "y1":np.random.rand(731),
                              "y2":np.random.rand(731), 
                              "z":np.random.rand(731)})

dfrm_test['d'] = np.random.randint(0,2, size= (len(dfrm_test),))
dfrm_test['dt'] = [datetime.date(2000, 1, 1) + drr(days=i) 
                   for i in range(len(dfrm_test))]

现在,当我尝试使用groupbyapply处理这些内容时会发生什么:

In [102]: dfrm_test.groupby("d").apply(foo)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-102-345a8d45df50> in <module>()
----> 1 dfrm_test.groupby("d").apply(foo)

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
    267         applied : type depending on grouped object and function
    268         """
--> 269         return self._python_apply_general(func, *args, **kwargs)
    270
    271     def aggregate(self, func, *args, **kwargs):

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, func, *args, **kwargs)
    402             group_axes = _get_axes(group)
    403
--> 404             res = func(group, *args, **kwargs)
    405
    406             if not _is_indexed_like(res, group_axes):

<ipython-input-101-8b9184c63365> in foo(zz)
      1 def foo(zz):
----> 2     zz1 = zz.sort("dt", ascending=True).set_index("dt")
      3     r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12)
      4     return r1.beta

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
   2287             arrays.append(level)
   2288
-> 2289         index = MultiIndex.from_arrays(arrays, names=keys)
   2290
   2291         if verify_integrity and not index.is_unique:

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
   1505         if len(arrays) == 1:
   1506             name = None if names is None else names[0]
-> 1507             return Index(arrays[0], name=name)
   1508
   1509         cats = [Categorical.from_array(arr) for arr in arrays]

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
    102         if dtype is None:
    103             if (lib.is_datetime_array(subarr)
--> 104                 or lib.is_datetime64_array(subarr)
    105                 or lib.is_timestamp_array(subarr)):
    106                 from pandas.tseries.index import DatetimeIndex

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    427     def __getitem__(self, key):
    428         try:
--> 429             return self.index.get_value(self, key)
    430         except InvalidIndexError:
    431             pass

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
    639         """
    640         try:
--> 641             return self._engine.get_value(series, key)
    642         except KeyError, e1:
    643             if len(self) > 0 and self.inferred_type == 'integer':

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()

KeyError: 0

如果我保存groupby对象并尝试自己应用foo,那么这也很简单:

In [103]: grps = dfrm_test.groupby("d")

In [104]: for grp in grps:
    foo(grp[1])
   .....:
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-104-f215ff55c12b> in <module>()
      1 for grp in grps:
----> 2     foo(grp[1])
      3

<ipython-input-101-8b9184c63365> in foo(zz)
      1 def foo(zz):
----> 2     zz1 = zz.sort("dt", ascending=True).set_index("dt")
      3     r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12)
      4     return r1.beta

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
   2287             arrays.append(level)
   2288
-> 2289         index = MultiIndex.from_arrays(arrays, names=keys)
   2290
   2291         if verify_integrity and not index.is_unique:

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
   1505         if len(arrays) == 1:
   1506             name = None if names is None else names[0]
-> 1507             return Index(arrays[0], name=name)
   1508
   1509         cats = [Categorical.from_array(arr) for arr in arrays]

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
    102         if dtype is None:
    103             if (lib.is_datetime_array(subarr)
--> 104                 or lib.is_datetime64_array(subarr)
    105                 or lib.is_timestamp_array(subarr)):
    106                 from pandas.tseries.index import DatetimeIndex

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    427     def __getitem__(self, key):
    428         try:
--> 429             return self.index.get_value(self, key)
    430         except InvalidIndexError:
    431             pass

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
    639         """
    640         try:
--> 641             return self._engine.get_value(series, key)
    642         except KeyError, e1:
    643             if len(self) > 0 and self.inferred_type == 'integer':

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()

KeyError: 0

但是如果我存储其中一个组数据帧,然后在其上调用foo,那么就可以正常运行 ...... ??

In [105]: for grp in grps:
    x = grp[1]
   .....:

In [106]: x.head()
Out[106]:
          x        y1        y2         z          dt  d
0  0.240858  0.235135  0.196027  0.940180  2000-01-01  1
1  0.115784  0.802576  0.870014  0.482418  2000-01-02  1
2  0.081640  0.939411  0.344041  0.846485  2000-01-03  1
5  0.608413  0.100349  0.306595  0.739987  2000-01-06  1
6  0.429635  0.678575  0.449520  0.362761  2000-01-07  1

In [107]: foo(x)
Out[107]:
<class 'pandas.core.frame.DataFrame'>
Index: 360 entries, 2000-01-17 to 2001-12-29
Data columns:
x            360  non-null values
intercept    360  non-null values
dtypes: float64(2)

这里发生了什么?当触发转换为错误日期/时间类型的逻辑被触发时,是否与这种情况有关?我该如何解决它?

简化问题

我可以简单地将问题简化为set_index函数中的apply调用。但这变得非常奇怪。这是一个使用更简单的测试DataFrame的示例,只有set_index

In [154]: tdf = pandas.DataFrame(
    {"dt":([datetime.date(2000,1,i+1) for i in range(12)] + 
           [datetime.date(2001,3,j+1) for j in range(13)]), 
     "d":np.random.randint(1,4,(25,)), 
     "x":np.random.rand(25)})

In [155]: tdf
Out[155]:
    d          dt         x
0   1  2000-01-01  0.430667
1   3  2000-01-02  0.159652
2   1  2000-01-03  0.719015
3   1  2000-01-04  0.175328
4   3  2000-01-05  0.233810
5   3  2000-01-06  0.581176
6   1  2000-01-07  0.912615
7   1  2000-01-08  0.534971
8   3  2000-01-09  0.373345
9   1  2000-01-10  0.182665
10  1  2000-01-11  0.286681
11  3  2000-01-12  0.054054
12  3  2001-03-01  0.861348
13  1  2001-03-02  0.093717
14  2  2001-03-03  0.729503
15  1  2001-03-04  0.888558
16  1  2001-03-05  0.263055
17  1  2001-03-06  0.558430
18  3  2001-03-07  0.064216
19  3  2001-03-08  0.018823
20  3  2001-03-09  0.207845
21  2  2001-03-10  0.735640
22  2  2001-03-11  0.908427
23  2  2001-03-12  0.819994
24  2  2001-03-13  0.798267

set_index在这里工作正常,没有日期更改或任何事情。

In [156]: tdf.set_index("dt")
Out[156]:
            d         x
dt
2000-01-01  1  0.430667
2000-01-02  3  0.159652
2000-01-03  1  0.719015
2000-01-04  1  0.175328
2000-01-05  3  0.233810
2000-01-06  3  0.581176
2000-01-07  1  0.912615
2000-01-08  1  0.534971
2000-01-09  3  0.373345
2000-01-10  1  0.182665
2000-01-11  1  0.286681
2000-01-12  3  0.054054
2001-03-01  3  0.861348
2001-03-02  1  0.093717
2001-03-03  2  0.729503
2001-03-04  1  0.888558
2001-03-05  1  0.263055
2001-03-06  1  0.558430
2001-03-07  3  0.064216
2001-03-08  3  0.018823
2001-03-09  3  0.207845
2001-03-10  2  0.735640
2001-03-11  2  0.908427
2001-03-12  2  0.819994
2001-03-13  2  0.798267

groupby无法成功set_index(在遇到任何不一致的大小的解包问题之前请注意错误,它根本无法重置索引)。

In [157]: tdf.groupby("d").apply(lambda x: x.set_index("dt"))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-157-cf2d3964f4d3> in <module>()
----> 1 tdf.groupby("d").apply(lambda x: x.set_index("dt"))

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
    267         applied : type depending on grouped object and function
    268         """
--> 269         return self._python_apply_general(func, *args, **kwargs)
    270
    271     def aggregate(self, func, *args, **kwargs):

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, func, *args, **kwargs)
    402             group_axes = _get_axes(group)
    403
--> 404             res = func(group, *args, **kwargs)
    405
    406             if not _is_indexed_like(res, group_axes):

<ipython-input-157-cf2d3964f4d3> in <lambda>(x)
----> 1 tdf.groupby("d").apply(lambda x: x.set_index("dt"))

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
   2287             arrays.append(level)
   2288
-> 2289         index = MultiIndex.from_arrays(arrays, names=keys)
   2290
   2291         if verify_integrity and not index.is_unique:

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
   1505         if len(arrays) == 1:
   1506             name = None if names is None else names[0]
-> 1507             return Index(arrays[0], name=name)
   1508
   1509         cats = [Categorical.from_array(arr) for arr in arrays]

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
    102         if dtype is None:
    103             if (lib.is_datetime_array(subarr)
--> 104                 or lib.is_datetime64_array(subarr)
    105                 or lib.is_timestamp_array(subarr)):
    106                 from pandas.tseries.index import DatetimeIndex

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    427     def __getitem__(self, key):
    428         try:
--> 429             return self.index.get_value(self, key)
    430         except InvalidIndexError:
    431             pass

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
    639         """
    640         try:
--> 641             return self._engine.get_value(series, key)
    642         except KeyError, e1:
    643             if len(self) > 0 and self.inferred_type == 'integer':

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()

KeyError: 0

非常奇怪的部分

这里我保存了组对象,并尝试手动调用set_index。这不起作用。即使我从组中保存了特定的DataFrame元素,它也不起作用。

In [159]: grps = tdf.groupby("d")

In [160]: grps
Out[160]: <pandas.core.groupby.DataFrameGroupBy at 0x7600bd0>

In [161]: grps_list = [(x,y) for x,y in grps]

In [162]: grps_list[2][1].set_index("dt")
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-162-77f985a6e063> in <module>()
----> 1 grps_list[2][1].set_index("dt")

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
   2287             arrays.append(level)
   2288
-> 2289         index = MultiIndex.from_arrays(arrays, names=keys)
   2290
   2291         if verify_integrity and not index.is_unique:

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
   1505         if len(arrays) == 1:
   1506             name = None if names is None else names[0]
-> 1507             return Index(arrays[0], name=name)
   1508
   1509         cats = [Categorical.from_array(arr) for arr in arrays]

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
    102         if dtype is None:
    103             if (lib.is_datetime_array(subarr)
--> 104                 or lib.is_datetime64_array(subarr)
    105                 or lib.is_timestamp_array(subarr)):
    106                 from pandas.tseries.index import DatetimeIndex

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    427     def __getitem__(self, key):
    428         try:
--> 429             return self.index.get_value(self, key)
    430         except InvalidIndexError:
    431             pass

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
    639         """
    640         try:
--> 641             return self._engine.get_value(series, key)
    642         except KeyError, e1:
    643             if len(self) > 0 and self.inferred_type == 'integer':

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()

/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()

KeyError: 0

但如果我构建了该组的DataFrame的手动直接副本,那么set_index 会对手动重建工作吗?

In [163]: grps_list[2][1]
Out[163]:
    d          dt         x
1   3  2000-01-02  0.159652
4   3  2000-01-05  0.233810
5   3  2000-01-06  0.581176
8   3  2000-01-09  0.373345
11  3  2000-01-12  0.054054
12  3  2001-03-01  0.861348
18  3  2001-03-07  0.064216
19  3  2001-03-08  0.018823
20  3  2001-03-09  0.207845

In [165]: recreation = pandas.DataFrame(
    {"d":[3,3,3,3,3,3,3,3,3], 
     "dt":[datetime.date(2000,1,2), datetime.date(2000,1,5), datetime.date(2000,1,6),
           datetime.date(2000,1,9), datetime.date(2000,1,12), datetime.date(2001,3,1),
           datetime.date(2001,3,7), datetime.date(2001,3,8), datetime.date(2001,3,9)], 
     "x":[0.159, 0.233, 0.581, 0.3733, 0.054, 0.861, 0.064, 0.0188, 0.2078]})

In [166]: recreation
Out[166]:
   d          dt       x
0  3  2000-01-02  0.1590
1  3  2000-01-05  0.2330
2  3  2000-01-06  0.5810
3  3  2000-01-09  0.3733
4  3  2000-01-12  0.0540
5  3  2001-03-01  0.8610
6  3  2001-03-07  0.0640
7  3  2001-03-08  0.0188
8  3  2001-03-09  0.2078

In [167]: recreation.set_index("dt")
Out[167]:
            d       x
dt
2000-01-02  3  0.1590
2000-01-05  3  0.2330
2000-01-06  3  0.5810
2000-01-09  3  0.3733
2000-01-12  3  0.0540
2001-03-01  3  0.8610
2001-03-07  3  0.0640
2001-03-08  3  0.0188
2001-03-09  3  0.2078

正如海盗可能会在阿切尔第3季的前几集中说:这该死的人该死的?

1 个答案:

答案 0 :(得分:1)

原来这是基于groupby中发生的事情,它将组的索引更改为MultiIndex。

通过添加一个调用来重置要应用apply的函数内部的索引,它解决了问题:

def foo(zz):
    zz1 = zz.sort("dt", ascending=True).reset_index().set_index("dt", inplace=True)
    r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12)
    return r1.beta

这至少提供了一种解决方法。