通过在列中重复范围来分离熊猫df

时间:2018-08-08 14:07:05

标签: python pandas

问题:

我正在尝试按A列中的重复范围拆分熊猫数据框。我的数据和输出如下。 A列中的范围始终在增加,并且不会跳过值。但是,A列中的值确实可以任意启动和停止。

数据:

import pandas as pd

dict = {"A": [1,2,3,2,3,4,3,4,5,6],
        "B": ["a","b","c","d","e","f","g","h","i","k"]}

df = pd.DataFrame(dict)

df

   A  B
0  1  a
1  2  b
2  3  c
3  2  d
4  3  e
5  4  f
6  3  g
7  4  h
8  5  i
9  6  k

期望的输出:

df1

   A  B
0  1  a
1  2  b
2  3  c

df2

   A  B
0  2  d
1  3  e
2  4  f

df3

   A  B
0  3  g
1  4  h
2  5  i
3  6  k

感谢您的咨询!

回答时间:

from timeit import default_timer as timer

start = timer()
for x ,y in df.groupby(df.A.diff().ne(1).cumsum()):
    print(y)
end = timer()
aa = end - start

start = timer()
s = (df.A.diff() != 1).cumsum()
g = df.groupby(s) 
for _,g_ in g:
    print(g_)
end = timer()
bb = end - start

start = timer()
[*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))]
print(*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum())), sep='\n\n')
end = timer()
cc = end - start

print(aa,bb,cc)

0.0176649530000077 0.018132143000002543 0.018715283999995336

3 个答案:

答案 0 :(得分:3)

使用groupbydiff创建cumsum密钥

for x ,y in df.groupby(df.A.diff().ne(1).cumsum()):
    print(y)

   A  B
0  1  a
1  2  b
2  3  c
   A  B
3  2  d
4  3  e
5  4  f
   A  B
6  3  g
7  4  h
8  5  i
9  6  k

答案 1 :(得分:3)

只需groupby即可

s = (df.A.diff() != 1).cumsum()
g = df.groupby(s)

for _,g_ in g:
    print(g_)

输出

   A  B
0  1  a
1  2  b
2  3  c

   A  B
3  2  d
4  3  e
5  4  f

   A  B
6  3  g
7  4  h
8  5  i
9  6  k

答案 2 :(得分:2)

单线

因为那很重要

[*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))]

打印

print(*(d for _, d in df.groupby(df.A.diff().ne(1).cumsum())), sep='\n\n')

   A  B
0  1  a
1  2  b
2  3  c

   A  B
3  2  d
4  3  e
5  4  f

   A  B
6  3  g
7  4  h
8  5  i
9  6  k

分配

df1, df2, df3 = (d for _, d in df.groupby(df.A.diff().ne(1).cumsum()))