我在一个因官僚主义原因无法升级的环境中使用Pandas 0.8.1。
在阅读有关初始问题和目标的所有内容之前,您可能需要跳到下面的“简化问题”部分。
我的目标:按分类列“D”对DataFrame进行分组,然后对每个组按日期列“dt”排序,将索引设置为“dt”,执行滚动OLS回归,并返回DataFrame按日期索引的回归系数beta
。
最终结果可能是一堆堆叠的beta
帧,每个帧对于某些特定的分类变量都是唯一的,因此最终索引将是两个级别,一个用于类别ID,一个用于日期。
如果我做的话
my_dataframe.groupby("D").apply(some_wrapped_OLS_caller)
然后我经常会遇到令人沮丧的无法提供信息KeyError: 0
错误,并且回溯似乎在日期时间问题上窒息:
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
2287 arrays.append(level)
2288
-> 2289 index = MultiIndex.from_arrays(arrays, names=keys)
2290
2291 if verify_integrity and not index.is_unique:
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
1505 if len(arrays) == 1:
1506 name = None if names is None else names[0]
-> 1507 return Index(arrays[0], name=name)
1508
1509 cats = [Categorical.from_array(arr) for arr in arrays]
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
102 if dtype is None:
103 if (lib.is_datetime_array(subarr)
--> 104 or lib.is_datetime64_array(subarr)
105 or lib.is_timestamp_array(subarr)):
106 from pandas.tseries.index import DatetimeIndex
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
427 def __getitem__(self, key):
428 try:
--> 429 return self.index.get_value(self, key)
430 except InvalidIndexError:
431 pass
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
639 """
640 try:
--> 641 return self._engine.get_value(series, key)
642 except KeyError, e1:
643 if len(self) > 0 and self.inferred_type == 'integer':
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()
KeyError: 0
如果我在逐个对象的每个组上手动执行回归步骤,那么一切都可以顺利进行。
代码:
import numpy as np
import pandas
import datetime
from dateutil.relativedelta import relativedelta as drr
def foo(zz):
zz1 = zz.sort("dt", ascending=True).set_index("dt")
r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12)
return r1.beta
dfrm_test = pandas.DataFrame({"x":np.random.rand(731),
"y1":np.random.rand(731),
"y2":np.random.rand(731),
"z":np.random.rand(731)})
dfrm_test['d'] = np.random.randint(0,2, size= (len(dfrm_test),))
dfrm_test['dt'] = [datetime.date(2000, 1, 1) + drr(days=i)
for i in range(len(dfrm_test))]
现在,当我尝试使用groupby
和apply
处理这些内容时会发生什么:
In [102]: dfrm_test.groupby("d").apply(foo)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-102-345a8d45df50> in <module>()
----> 1 dfrm_test.groupby("d").apply(foo)
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
267 applied : type depending on grouped object and function
268 """
--> 269 return self._python_apply_general(func, *args, **kwargs)
270
271 def aggregate(self, func, *args, **kwargs):
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, func, *args, **kwargs)
402 group_axes = _get_axes(group)
403
--> 404 res = func(group, *args, **kwargs)
405
406 if not _is_indexed_like(res, group_axes):
<ipython-input-101-8b9184c63365> in foo(zz)
1 def foo(zz):
----> 2 zz1 = zz.sort("dt", ascending=True).set_index("dt")
3 r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12)
4 return r1.beta
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
2287 arrays.append(level)
2288
-> 2289 index = MultiIndex.from_arrays(arrays, names=keys)
2290
2291 if verify_integrity and not index.is_unique:
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
1505 if len(arrays) == 1:
1506 name = None if names is None else names[0]
-> 1507 return Index(arrays[0], name=name)
1508
1509 cats = [Categorical.from_array(arr) for arr in arrays]
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
102 if dtype is None:
103 if (lib.is_datetime_array(subarr)
--> 104 or lib.is_datetime64_array(subarr)
105 or lib.is_timestamp_array(subarr)):
106 from pandas.tseries.index import DatetimeIndex
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
427 def __getitem__(self, key):
428 try:
--> 429 return self.index.get_value(self, key)
430 except InvalidIndexError:
431 pass
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
639 """
640 try:
--> 641 return self._engine.get_value(series, key)
642 except KeyError, e1:
643 if len(self) > 0 and self.inferred_type == 'integer':
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()
KeyError: 0
如果我保存groupby
对象并尝试自己应用foo
,那么这也很简单:
In [103]: grps = dfrm_test.groupby("d")
In [104]: for grp in grps:
foo(grp[1])
.....:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-104-f215ff55c12b> in <module>()
1 for grp in grps:
----> 2 foo(grp[1])
3
<ipython-input-101-8b9184c63365> in foo(zz)
1 def foo(zz):
----> 2 zz1 = zz.sort("dt", ascending=True).set_index("dt")
3 r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12)
4 return r1.beta
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
2287 arrays.append(level)
2288
-> 2289 index = MultiIndex.from_arrays(arrays, names=keys)
2290
2291 if verify_integrity and not index.is_unique:
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
1505 if len(arrays) == 1:
1506 name = None if names is None else names[0]
-> 1507 return Index(arrays[0], name=name)
1508
1509 cats = [Categorical.from_array(arr) for arr in arrays]
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
102 if dtype is None:
103 if (lib.is_datetime_array(subarr)
--> 104 or lib.is_datetime64_array(subarr)
105 or lib.is_timestamp_array(subarr)):
106 from pandas.tseries.index import DatetimeIndex
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
427 def __getitem__(self, key):
428 try:
--> 429 return self.index.get_value(self, key)
430 except InvalidIndexError:
431 pass
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
639 """
640 try:
--> 641 return self._engine.get_value(series, key)
642 except KeyError, e1:
643 if len(self) > 0 and self.inferred_type == 'integer':
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()
KeyError: 0
但是如果我存储其中一个组数据帧,然后在其上调用foo
,那么就可以正常运行 ...... ??
In [105]: for grp in grps:
x = grp[1]
.....:
In [106]: x.head()
Out[106]:
x y1 y2 z dt d
0 0.240858 0.235135 0.196027 0.940180 2000-01-01 1
1 0.115784 0.802576 0.870014 0.482418 2000-01-02 1
2 0.081640 0.939411 0.344041 0.846485 2000-01-03 1
5 0.608413 0.100349 0.306595 0.739987 2000-01-06 1
6 0.429635 0.678575 0.449520 0.362761 2000-01-07 1
In [107]: foo(x)
Out[107]:
<class 'pandas.core.frame.DataFrame'>
Index: 360 entries, 2000-01-17 to 2001-12-29
Data columns:
x 360 non-null values
intercept 360 non-null values
dtypes: float64(2)
这里发生了什么?当触发转换为错误日期/时间类型的逻辑被触发时,是否与这种情况有关?我该如何解决它?
简化问题
我可以简单地将问题简化为set_index
函数中的apply
调用。但这变得非常奇怪。这是一个使用更简单的测试DataFrame的示例,只有set_index
。
In [154]: tdf = pandas.DataFrame(
{"dt":([datetime.date(2000,1,i+1) for i in range(12)] +
[datetime.date(2001,3,j+1) for j in range(13)]),
"d":np.random.randint(1,4,(25,)),
"x":np.random.rand(25)})
In [155]: tdf
Out[155]:
d dt x
0 1 2000-01-01 0.430667
1 3 2000-01-02 0.159652
2 1 2000-01-03 0.719015
3 1 2000-01-04 0.175328
4 3 2000-01-05 0.233810
5 3 2000-01-06 0.581176
6 1 2000-01-07 0.912615
7 1 2000-01-08 0.534971
8 3 2000-01-09 0.373345
9 1 2000-01-10 0.182665
10 1 2000-01-11 0.286681
11 3 2000-01-12 0.054054
12 3 2001-03-01 0.861348
13 1 2001-03-02 0.093717
14 2 2001-03-03 0.729503
15 1 2001-03-04 0.888558
16 1 2001-03-05 0.263055
17 1 2001-03-06 0.558430
18 3 2001-03-07 0.064216
19 3 2001-03-08 0.018823
20 3 2001-03-09 0.207845
21 2 2001-03-10 0.735640
22 2 2001-03-11 0.908427
23 2 2001-03-12 0.819994
24 2 2001-03-13 0.798267
set_index
在这里工作正常,没有日期更改或任何事情。
In [156]: tdf.set_index("dt")
Out[156]:
d x
dt
2000-01-01 1 0.430667
2000-01-02 3 0.159652
2000-01-03 1 0.719015
2000-01-04 1 0.175328
2000-01-05 3 0.233810
2000-01-06 3 0.581176
2000-01-07 1 0.912615
2000-01-08 1 0.534971
2000-01-09 3 0.373345
2000-01-10 1 0.182665
2000-01-11 1 0.286681
2000-01-12 3 0.054054
2001-03-01 3 0.861348
2001-03-02 1 0.093717
2001-03-03 2 0.729503
2001-03-04 1 0.888558
2001-03-05 1 0.263055
2001-03-06 1 0.558430
2001-03-07 3 0.064216
2001-03-08 3 0.018823
2001-03-09 3 0.207845
2001-03-10 2 0.735640
2001-03-11 2 0.908427
2001-03-12 2 0.819994
2001-03-13 2 0.798267
groupby
无法成功set_index
(在遇到任何不一致的大小的解包问题之前请注意错误,它根本无法重置索引)。
In [157]: tdf.groupby("d").apply(lambda x: x.set_index("dt"))
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-157-cf2d3964f4d3> in <module>()
----> 1 tdf.groupby("d").apply(lambda x: x.set_index("dt"))
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
267 applied : type depending on grouped object and function
268 """
--> 269 return self._python_apply_general(func, *args, **kwargs)
270
271 def aggregate(self, func, *args, **kwargs):
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, func, *args, **kwargs)
402 group_axes = _get_axes(group)
403
--> 404 res = func(group, *args, **kwargs)
405
406 if not _is_indexed_like(res, group_axes):
<ipython-input-157-cf2d3964f4d3> in <lambda>(x)
----> 1 tdf.groupby("d").apply(lambda x: x.set_index("dt"))
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
2287 arrays.append(level)
2288
-> 2289 index = MultiIndex.from_arrays(arrays, names=keys)
2290
2291 if verify_integrity and not index.is_unique:
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
1505 if len(arrays) == 1:
1506 name = None if names is None else names[0]
-> 1507 return Index(arrays[0], name=name)
1508
1509 cats = [Categorical.from_array(arr) for arr in arrays]
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
102 if dtype is None:
103 if (lib.is_datetime_array(subarr)
--> 104 or lib.is_datetime64_array(subarr)
105 or lib.is_timestamp_array(subarr)):
106 from pandas.tseries.index import DatetimeIndex
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
427 def __getitem__(self, key):
428 try:
--> 429 return self.index.get_value(self, key)
430 except InvalidIndexError:
431 pass
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
639 """
640 try:
--> 641 return self._engine.get_value(series, key)
642 except KeyError, e1:
643 if len(self) > 0 and self.inferred_type == 'integer':
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()
KeyError: 0
非常奇怪的部分
这里我保存了组对象,并尝试手动调用set_index
。这不起作用。即使我从组中保存了特定的DataFrame元素,它也不起作用。
In [159]: grps = tdf.groupby("d")
In [160]: grps
Out[160]: <pandas.core.groupby.DataFrameGroupBy at 0x7600bd0>
In [161]: grps_list = [(x,y) for x,y in grps]
In [162]: grps_list[2][1].set_index("dt")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-162-77f985a6e063> in <module>()
----> 1 grps_list[2][1].set_index("dt")
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, inplace, verify_integrity)
2287 arrays.append(level)
2288
-> 2289 index = MultiIndex.from_arrays(arrays, names=keys)
2290
2291 if verify_integrity and not index.is_unique:
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in from_arrays(cls, arrays, sortorder, names)
1505 if len(arrays) == 1:
1506 name = None if names is None else names[0]
-> 1507 return Index(arrays[0], name=name)
1508
1509 cats = [Categorical.from_array(arr) for arr in arrays]
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in __new__(cls, data, dtype, copy, name)
102 if dtype is None:
103 if (lib.is_datetime_array(subarr)
--> 104 or lib.is_datetime64_array(subarr)
105 or lib.is_timestamp_array(subarr)):
106 from pandas.tseries.index import DatetimeIndex
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.is_datetime64_array (pandas/src/tseries.c:90291)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
427 def __getitem__(self, key):
428 try:
--> 429 return self.index.get_value(self, key)
430 except InvalidIndexError:
431 pass
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
639 """
640 try:
--> 641 return self._engine.get_value(series, key)
642 except KeyError, e1:
643 if len(self) > 0 and self.inferred_type == 'integer':
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103842)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_value (pandas/src/tseries.c:103670)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.IndexEngine.get_loc (pandas/src/tseries.c:104379)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15547)()
/opt/epd/7.3-2_pandas0.8.1/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.Int64HashTable.get_item (pandas/src/tseries.c:15501)()
KeyError: 0
但如果我构建了该组的DataFrame的手动直接副本,那么set_index
会对手动重建工作吗?
In [163]: grps_list[2][1]
Out[163]:
d dt x
1 3 2000-01-02 0.159652
4 3 2000-01-05 0.233810
5 3 2000-01-06 0.581176
8 3 2000-01-09 0.373345
11 3 2000-01-12 0.054054
12 3 2001-03-01 0.861348
18 3 2001-03-07 0.064216
19 3 2001-03-08 0.018823
20 3 2001-03-09 0.207845
In [165]: recreation = pandas.DataFrame(
{"d":[3,3,3,3,3,3,3,3,3],
"dt":[datetime.date(2000,1,2), datetime.date(2000,1,5), datetime.date(2000,1,6),
datetime.date(2000,1,9), datetime.date(2000,1,12), datetime.date(2001,3,1),
datetime.date(2001,3,7), datetime.date(2001,3,8), datetime.date(2001,3,9)],
"x":[0.159, 0.233, 0.581, 0.3733, 0.054, 0.861, 0.064, 0.0188, 0.2078]})
In [166]: recreation
Out[166]:
d dt x
0 3 2000-01-02 0.1590
1 3 2000-01-05 0.2330
2 3 2000-01-06 0.5810
3 3 2000-01-09 0.3733
4 3 2000-01-12 0.0540
5 3 2001-03-01 0.8610
6 3 2001-03-07 0.0640
7 3 2001-03-08 0.0188
8 3 2001-03-09 0.2078
In [167]: recreation.set_index("dt")
Out[167]:
d x
dt
2000-01-02 3 0.1590
2000-01-05 3 0.2330
2000-01-06 3 0.5810
2000-01-09 3 0.3733
2000-01-12 3 0.0540
2001-03-01 3 0.8610
2001-03-07 3 0.0640
2001-03-08 3 0.0188
2001-03-09 3 0.2078
正如海盗可能会在阿切尔第3季的前几集中说:这该死的人该死的?
答案 0 :(得分:1)
原来这是基于groupby
中发生的事情,它将组的索引更改为MultiIndex。
通过添加一个调用来重置要应用apply
的函数内部的索引,它解决了问题:
def foo(zz):
zz1 = zz.sort("dt", ascending=True).reset_index().set_index("dt", inplace=True)
r1 = pandas.ols(y=zz1["y1"], x=zz1["x"], window=60, min_periods=12)
return r1.beta
这至少提供了一种解决方法。