我已经搜索了其他问题,但没有人解决这个问题。这个问题的焦点是直接操纵群体。
假设我有以下数据框:
A B C Bg
0 1 X 1 None
1 2 A 7 None
2 3 X 9 1
3 4 X 1 1
4 5 B 1 None
5 6 X 0 None
6 7 C 8 None
7 8 A 5 None
8 9 X 9 2
9 10 X 4 2
10 11 X 2 2
11 12 A 4 None
然后按“Bg”列分组:
groups = df2.groupby('Bg')
for name, group in groups:
print('name:', name, '\n', group, '\n\n')
这些小组将是这样的:
name: 1
A B C Bg
2 3 X 9 1
3 4 X 1 1
name: 2
A B C Bg
8 9 X 9 2
9 10 X 4 2
10 11 X 2 2
我编写了以下代码来执行某些任务并操纵组:
groups3 = copy.deepcopy(groups)
for name, group in groups3:
idx_first = group.index[0]
idx_last = group.index[-1]
if name == 2:
groups3.groups[name] = np.delete(groups3.groups[name], range(0, 1), axis=0)
else:
del groups3.groups[name]
print('groups', groups3.groups)
print('-------')
for name, group in groups3:
print(group)
,输出为:
groups {2: Int64Index([9, 10], dtype='int64')}
-------
A B C Bg
2 3 X 9 1
3 4 X 1 1
A B C Bg
8 9 X 9 2
9 10 X 4 2
10 11 X 2 2
但是,我在输出中期待这个:
groups {2: Int64Index([9, 10], dtype='int64')}
-------
A B C Bg
9 10 X 4 2
10 11 X 2 2
答案 0 :(得分:2)
这是一个严重凌乱的兔子洞......
短篇小说
通过迭代groups
def __iter__(self):
"""
Groupby iterator
Returns
-------
Generator yielding sequence of (name, subsetted object)
for each group
"""
return self.grouper.get_iterator(self.obj, axis=self.axis)
def get_iterator(self, data, axis=0):
"""
Groupby iterator
Returns
-------
Generator yielding sequence of (name, subsetted object)
for each group
"""
splitter = self._get_splitter(data, axis=axis)
keys = self._get_group_keys()
for key, (i, group) in zip(keys, splitter):
yield key, group
其中引用了_get_splitter
和_get_group_keys
在这两个中,我们看到group_info
返回一个控制迭代的模糊且受到良好保护的元组。我无法弄清楚如何完全控制迭代,但我可能搞砸了。
a, b, c = groups3.grouper.group_info
a[a==1] = -1
for name, group in groups3:
print(group)
A B C Bg
2 3 X 9 1
3 4 X 1 1
Empty DataFrame
Columns: [A, B, C, Bg]
Index: []
我的建议......不要这样做!
选项1
filter
然后再次groupby
df2.groupby('Bg').filter(lambda x: x.name != '2').groupby('Bg')
选项2
字典理解
{name: group for name, group in groups3 if name != '2'}