操纵groupby对象

时间:2017-10-27 01:06:02

标签: python pandas pandas-groupby

我已经搜索了其他问题,但没有人解决这个问题。这个问题的焦点是直接操纵群体。

假设我有以下数据框:

     A  B  C    Bg
0    1  X  1  None
1    2  A  7  None
2    3  X  9     1
3    4  X  1     1
4    5  B  1  None
5    6  X  0  None
6    7  C  8  None
7    8  A  5  None
8    9  X  9     2
9   10  X  4     2
10  11  X  2     2
11  12  A  4  None

然后按“Bg”列分组:

groups = df2.groupby('Bg')
for name, group in groups:
    print('name:', name, '\n', group, '\n\n')

这些小组将是这样的:

name: 1 
    A  B  C Bg
2  3  X  9  1
3  4  X  1  1 


name: 2 
      A  B  C Bg
8    9  X  9  2
9   10  X  4  2
10  11  X  2  2 

我编写了以下代码来执行某些任务并操纵组:

groups3 = copy.deepcopy(groups)
for name, group in groups3:
    idx_first = group.index[0]
    idx_last = group.index[-1]
    if name == 2:      
        groups3.groups[name] = np.delete(groups3.groups[name], range(0, 1), axis=0)
    else:
        del groups3.groups[name]
print('groups', groups3.groups)

print('-------')
for name, group in groups3:
    print(group)

,输出为:

groups {2: Int64Index([9, 10], dtype='int64')}
-------
   A  B  C Bg
2  3  X  9  1
3  4  X  1  1
     A  B  C Bg
8    9  X  9  2
9   10  X  4  2
10  11  X  2  2

但是,我在输出中期待这个:

groups {2: Int64Index([9, 10], dtype='int64')}
-------
     A  B  C Bg
9   10  X  4  2
10  11  X  2  2

1 个答案:

答案 0 :(得分:2)

这是一个严重凌乱的兔子洞......

enter image description here

短篇小说
通过迭代groups

返回的字典来控制groupby对象的迭代

def __iter__

开头
def __iter__(self):
    """
    Groupby iterator

    Returns
    -------
    Generator yielding sequence of (name, subsetted object)
    for each group
    """
    return self.grouper.get_iterator(self.obj, axis=self.axis)

然后到def get_iterator

def get_iterator(self, data, axis=0):
    """
    Groupby iterator

    Returns
    -------
    Generator yielding sequence of (name, subsetted object)
    for each group
    """
    splitter = self._get_splitter(data, axis=axis)
    keys = self._get_group_keys()
    for key, (i, group) in zip(keys, splitter):
        yield key, group

其中引用了_get_splitter_get_group_keys

在这两个中,我们看到group_info返回一个控制迭代的模糊且受到良好保护的元组。我无法弄清楚如何完全控制迭代,但我可能搞砸了。

a, b, c = groups3.grouper.group_info
a[a==1] = -1

for name, group in groups3:
    print(group)

   A  B  C Bg
2  3  X  9  1
3  4  X  1  1
Empty DataFrame
Columns: [A, B, C, Bg]
Index: []

我的建议......不要这样做!

选项1
filter然后再次groupby

df2.groupby('Bg').filter(lambda x: x.name != '2').groupby('Bg')

选项2
字典理解

{name: group for name, group in groups3 if name != '2'}