Question

我有一个这样的系列： test = pd.Series([2.4,5.6,8.8,25.6,53.6,1.7,5.7,8.9])

我想在下一个数字小于前一个数字的位置将其拆分为两个系列。这只发生在任何系列中一次，但它不会发生在可靠的位置（可能是第7位，第4位等）。

结果应如下所示：

test1
2.4
5.6
8.8
25.6
53.6

和

test2
1.7
5.7
8.9

Answer 1

您可以使用

找到该职位

pos = (test - test.shift(-1)).argmax()

现在系列直到

>>> test[: pos + 1]
0     2.4
1     5.6
2     8.8
3    25.6
4    53.6
dtype: float64

同样，其余部分是

>>> test[pos + 1: ]
5    1.7
6    5.7
7    8.9
dtype: float64

Answer 2

可以压缩到发电机并使用下一个。然后我们使用np.split并映射到pd.Series。应该很快：

import pandas as pd
import numpy as np

test = pd.Series([2.4,5.6,8.8,25.6,53.6,1.7,5.7,8.9])

i = next(ind for ind, v in enumerate(zip(test,test[1:])) if v[0] > v[1])
test1, test2 = map(pd.Series,np.split(test, [i+1]))

或者写成“一行”，如：

test1, test2 = map(pd.Series,
                   np.split(test, [next((ind for ind, v in enumerate(zip(test,test[1:])) 
                                         if v[0] > v[1])+1, None)]))

时间比较：

%timeit map(pd.Series,np.split(test, [next((ind for ind, v in enumerate(zip(test,test[1:])) if v[0] > v[1]), None) + 1]))
%timeit (i for _, i in test.groupby(test.diff().lt(0).cumsum()))
%timeit map(pd.Series,np.split(test, [(test - test.shift(-1)).idxmax() + 1]))

结果：

#1000 loops, best of 3: 237 µs per loop  <- Anton vbr
#1000 loops, best of 3: 599 µs per loop  <- Scott Boston
#1000 loops, best of 3: 392 µs per loop  <- Ami Tavory

Answer 3

你可以这样做：

700px

输出：

for n,g in test.groupby(test.diff().lt(0).cumsum()):
    print(g)
    print("\n")

正如@AntonvBR所暗示的那样：

0     2.4
1     5.6
2     8.8
3    25.6
4    53.6
dtype: float64


5    1.7
6    5.7
7    8.9
dtype: float64

当数字低于前一个数字时，拆分数据帧

3 个答案: