Question

我有一个数据帧，如果下一行的索引大于1加上前一个索引（例如，如果它从索引73变为75或更高，例如），我想将它们拆分为单独的数据帧。我怎样才能做到这一点？

Answer 1

这可以使用通常的compare-cumsum-groupby模式的变体来完成，仅应用于索引而不是列。（至少如果索引是正常的话。）例如：

>>> df = pd.DataFrame({"A": list("abcde")}, index=[1,2,4,5,8])
>>> df
   A
1  a
2  b
4  c
5  d
8  e
>>> grouped = df.groupby((df.index.to_series().diff() > 1).cumsum())
>>> for group_id, group in grouped:
...     print("group id:", group_id)
...     print(group)
...     print()
...     
group id: 0
   A
1  a
2  b

group id: 1
   A
4  c
5  d

group id: 2
   A
8  e

您可以直接使用frames = [g for k,g in grouped]或其他内容获取框架。

这是有效的，因为我们可以使用diff来比较索引中的跳转（转换为系列之后），然后如果我们采用累积的一些bool，其中差异大于1，我们得到每个群体的指数都在增长：

>>> df.index.to_series().diff()
1   NaN
2     1
4     2
5     1
8     3
dtype: float64
>>> df.index.to_series().diff() > 1
1    False
2    False
4     True
5    False
8     True
dtype: bool
>>> (df.index.to_series().diff() > 1).cumsum()
1    0
2    0
4    1
5    1
8    2
dtype: int64

如何根据行索引将pandas数据帧拆分为组

1 个答案: