使用Pandas DataFrame的groupby方法时出现StopIteration错误

时间:2014-07-26 19:27:54

标签: python-3.x pandas

我在这个人在StackOverflow上发布的groupby方法遇到了类似的问题:

pandas group StopIteration error

我尝试使用grouby方法做的更简单,但我收到类似的StopIteration错误:

Traceback (most recent call last):
  File "prepare_data_TJ2012_v1p0.py", line 107, in <module>
    grouped = df.groupby('hh').apply(f)
  File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 637, in apply
    return self._python_apply_general(f)
  File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 644, in _python_apply_general
    not_indexed_same=mutated)
  File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 2657, in _wrap_applied_output
    v = next(v for v in values if v is not None)
StopIteration

以下是生成它的代码:

df = pd.DataFrame(
            {'educ': {0: 'pri', 1: 'bach', 2: 'pri', 3: 'hi', 4: 'bach', 5: 'sec', 
                6: 'hi', 7: 'hi', 8: 'pri', 9: 'pri'}, 
             'hh': {0: 1, 1: 1, 2: 1, 3: 2, 4: 3, 5: 3, 6: 4, 7: 4, 8: 4, 9: 4}, 
             'id': {0: 1, 1: 2, 2: 3, 3: 1, 4: 1, 5: 2, 6: 1, 7: 2, 8: 3, 9: 4}, 
             'has_car': {0: 1, 1: 1, 2: 1, 3: 1, 4: 0, 5: 0, 6: 1, 7: 1, 8: 1, 9: 1}, 
             'weighthh': {0: 2, 1: 2, 2: 2, 3: 3, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 3}, 
             'house_rooms': {0: 3, 1: 3, 2: 3, 3: 2, 4: 1, 5: 1, 6: 3, 7: 3, 8: 3, 9: 3}, 
             'prov': {0: 'BC', 1: 'BC', 2: 'BC', 3: 'Alberta', 4: 'BC', 5: 'BC', 6: 'Alberta', 
                7: 'Alberta', 8: 'Alberta', 9: 'Alberta'}, 
             'age': {0: 44, 1: 43, 2: 13, 3: 70, 4: 23, 5: 20, 6: 37, 7: 35, 8: 8, 9: 15}, 
             'fridge': {0: 'yes', 1: 'yes', 2: 'yes', 3: 'no', 4: 'yes', 5: 'yes', 6: 'no', 
                7: 'no', 8: 'no', 9: 'no'}, 
             'male': {0: 1, 1: 0, 2: 1, 3: 1, 4: 1, 5: 0, 6: 1, 7: 0, 8: 0, 9: 0}})
print(df)
print('-- groupby dataframes ---')
def f(df):
    print('-------------------------')
    print('DataFrame' )
    print(df)
    s = df['age']
    print(s)
    print('----> Not nulls:')
    s_notnulls = ~s.isnull()
    print(s_notnulls)
    print('----> Number of non-nulls: %d' % len(s_notnulls[s_notnulls==True]))
df.groupby('hh').apply(f)

如果另一列中至少有一个非空值,我想按组对列执行操作。

我正在使用pandas==0.14.1。似乎群体上的循环太长了。这是一个错误吗? (或者我使用groupby方法错误...)

1 个答案:

答案 0 :(得分:8)

您收到此错误,因为您要传递的功能不会返回任何内容。如果你关心的只是打印输出,你可以像这样返回df。

def f(df):
    print('-------------------------')
    print('DataFrame' )
    print(df)
    s = df['age']
    print(s)
    print('----> Not nulls:')
    s_notnulls = ~s.isnull()
    print(s_notnulls)
    print('----> Number of non-nulls: %d' % len(s_notnulls[s_notnulls==True]))

    return df

然后申请将无误地运行。

In [295]: df.groupby('hh').apply(f)
-------------------------
DataFrame
   age  educ fridge  has_car  hh  house_rooms  id  male prov  weighthh
0   44   pri    yes        1   1            3   1     1   BC         2
1   43  bach    yes        1   1            3   2     0   BC         2
2   13   pri    yes        1   1            3   3     1   BC         2
.....