Python-仅剪切数据集的降序部分

时间:2018-08-17 15:51:30

标签: python pandas numpy

我有一个时间序列,各种各样的低调。我的问题是如何切片熊猫数据框(在本例中为简单起见,为数组)以获取降序位的数据及其索引时间序列?downcast

import matplotlib.pyplot as plt
import numpy as np


b = np.asarray([  1.3068586 ,   1.59882279,   2.11291473,   2.64699527,
     3.23948166,   3.81979878,   4.37630243,   4.97740025,
     5.59247254,   6.18671493,   6.77414586,   7.43078595,
     8.02243495,   8.59612224,   9.22302662,   9.83263379,
    10.43125902,  11.0956864 ,  11.61107838,  12.09616684,
    12.63973254,  12.49437955,  11.6433792 ,  10.61083269,
     9.50534291,   8.47418827,   7.40571742,   6.56611512,
     5.66963658,   4.89748187,   4.10543794,   3.44828054,
     2.76866318,   2.24306623,   1.68034463,   1.26568186,
     1.44548443,   2.01225076,   2.60715524,   3.21968562,
     3.8622007 ,   4.57035958,   5.14021305,   5.77879484,
     6.42776897,   7.09397923,   7.71722028,   8.30860725,
     8.96652218,   9.66157193,  10.23469208,  10.79889453,
    10.5788411 ,   9.38270646,   7.82070643,   6.74893389,
     5.68200335,   4.73429009,   3.78358222,   3.05924946,
     2.30428171,   1.78052369,   1.27897065,   1.16840532,
     1.59452726,   2.13085096,   2.70989933,   3.3396291 ,
     3.97318058,   4.62429262,   5.23997774,   5.91232803,
     6.5906609 ,   7.21099657,   7.82936331,   8.49636247,
     9.15634983,   9.76450244,  10.39680729,  11.04659976,
    11.69287237,  12.35692643,  12.99957563,  13.66228386,
    14.31806385,  14.91871927,  15.57212978,  16.22288287,
    16.84697357,  17.50502002,  18.15907842,  18.83068151,
    19.50945548,  20.18020639,  20.84441358,  21.52792846,
    22.17933087,  22.84614545,  23.51212887,  24.18308399,
    24.8552263 ,  25.51709528,  26.18724379,  26.84531493,
    27.50690265,  28.16610365,  28.83394822,  29.49621179,
    30.15118676,  30.8019521 ,  31.46714114,  32.1213546 ,
    32.79366952,  33.45233007,  34.12158193,  34.77502197,
    35.4532211 ,  36.11018053,  36.76540453,  37.41746323])

 plt.plot(-b)
 plt.show()

3 个答案:

答案 0 :(得分:1)

创建第二个数据框,将所有索引从一个索引中移出,然后将它们逐项相减。你应该得到你想要的(只得到负差值的那些) 在这里:

df = DataFrame(b)
df = concat([df.shift(1),df],axis = 1)
df.columns = ['t-1','t']
df.reset_index()
df = df.drop(df.index[0])
df['diff'] = df['t']-df['t-1']
res = df[df['diff']<0]

答案 1 :(得分:1)

您可以将负差异更改为NaN,然后绘制:

bb = pd.Series(-b)
bb[bb.diff().ge(0)] = np.nan
bb.plot()

plot

要获取降序索引,请使用:

bb.index[bb.diff().lt(0)]

Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
             14,  15,  16,  17,  18,  19,  20,  37,  38,  39,  40,  41,  42,
             43,  44,  45,  46,  47,  48,  49,  50,  51,  65,  66,  67,  68,
             69,  70,  71,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,
             82,  83,  84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,
             95,  96,  97,  98,  99, 100, 101, 102, 103, 104, 105, 106, 107,
            108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119],
           dtype='int64')

答案 2 :(得分:1)

还有一个简单的仅基于Numpy的解决方案(问题标记为ENGAGEMENT_MINUTES,但代码仅使用pandas)使用np.where。您需要图形下降的点,这意味着数据在上升。

numpy

请注意,这将为您在每个递增的连续值对中提供第一个值,而基于熊猫的解决方案将为您提供第二个值。要获得相同的索引,只需将# the indices where the data is ascending. ix, = np.where(np.diff(b) > 0) # the values c = b[ix] 添加到1

ix