根据标签对numpy数组进行排序

时间:2014-01-10 13:00:59

标签: python arrays sorting numpy

我是python / numpy / scipy的新手。我有一系列测量数字,其值介于41和48之间。该序列应该是周期性的并且包含高于44的值。它通常包括高于44的2个值,然后是25个值,然后是高于44的4个值然后低于25的值,并且该序列无限地重复。目前,我将整个序列放在一个numpy数组sequence_np[k]中,这里是k = 0到k = 100的典型测量值的示例:

Out[203]:
array([ 380.01873779,   50.55169296,   42.77656174,   42.65046692,
     42.43999863,   42.40890503,   42.2605629 ,   42.10827637,
     42.1680069 ,   42.11898422,   42.12486267,   42.04322433,
     42.0951767 ,   41.91516495,   42.06472015,   41.93815613,
     41.89348221,   41.94158936,   41.89970016,   41.9259491 ,
     41.81912231,   41.86272049,   41.75209045,   41.80240631,
     41.76036072,   45.26025391,   46.82533646,   46.50036621,
     44.32830048,   41.76332855,   41.78057098,   41.67221069,
     41.64173889,   41.65129471,   41.61840439,   41.69548798,
     41.67128372,   41.62164307,   41.44159698,   41.66975021,
     41.57759857,   41.61584091,   41.49560165,   41.56158066,
     41.56072617,   41.67589951,   41.53749466,   41.40168762,
     41.61867142,   41.5602684 ,   41.50307083,   41.52831268,
     45.30022049,   48.27410507,   41.43214798,   41.47129059,
     41.51285934,   41.50564957,   41.51237106,   41.46631241,
     41.47317505,   41.51570129,   41.50091553,   41.43928528,
     41.4920578 ,   41.46780014,   41.40225983,   41.38835144,
     41.38163376,   41.4108696 ,   41.46929169,   41.36049652,
     41.50559998,   41.37825775,   41.51349258,   41.33029938,
     41.39609909,   41.4021492 ,   41.35882187,   44.03855896,
     47.99585342,   48.23322678,   44.07878113,   41.41098022,
     41.24314117,   41.35625839,   41.45568848,   41.24409866,
     41.36055756,   41.3132782 ,   41.46337891,   41.24599838,
     41.31368256,   41.22081375,   41.38015366,   41.34594345,
     41.31343842,   41.25972748,   41.36122131,   41.23865891,
     41.39728546], dtype=float32)

我想根据上面的值44对数组进行排序,但是想要一个不同的列表或数组:

  • 位于上面两个44之间的值和44之上的四个值:我称之为value_A [i]
  • 位于上面四个44和上面两个44之间的值:Value_B [i]

对于i = 1到N,其中N取决于我的数字序列的长度。 序列可以从任何地方开始,即不一定有两个44值。

例如value_A[0]将从k = 2开始并在25结束,即value_A[0] = sequence_np[2:25]

Out[206]:
array([ 42.77656174,  42.65046692,  42.43999863,  42.40890503,
    42.2605629 ,  42.10827637,  42.1680069 ,  42.11898422,
    42.12486267,  42.04322433,  42.0951767 ,  41.91516495,
    42.06472015,  41.93815613,  41.89348221,  41.94158936,
    41.89970016,  41.9259491 ,  41.81912231,  41.86272049,
    41.75209045,  41.80240631,  41.76036072], dtype=float32)

value_B[0] = sequence_np[29:52]

Out[218]:
array([ 41.76332855,  41.78057098,  41.67221069,  41.64173889,
    41.65129471,  41.61840439,  41.69548798,  41.67128372,
    41.62164307,  41.44159698,  41.66975021,  41.57759857,
    41.61584091,  41.49560165,  41.56158066,  41.56072617,
    41.67589951,  41.53749466,  41.40168762,  41.61867142,
    41.5602684 ,  41.50307083,  41.52831268], dtype=float32)

等等。目前,我使用以下命令定义了一个布尔列表IndexSequence

IndexSequence = sequence_np>44

我得到了以下输出:

Out[182]: array([ True,  True, False, False, False, False, False, False, False,
   False, False, False, False, False, False, False, False, False,
   False, False, False, False, False, False, False,  True,  True,
    True,  True, False, False, False, False, False, False, False,
   False, False, False, False, False, False, False, False, False,
   False, False, False, False, False, False, False,  True,  True,
   False, False, False, False, False, False, False, False, False,
   False, False, False, False, False, False, False, False, False,
   False, False, False, False, False, False, False,  True,  True,
    True,  True, False, False, False, False, False, False, False,
   False, False, False, False, False, False, False, False, False,
   False, False], dtype=bool)

所以我设法找到44以下的值,但不设法对它们进行排序。

欢迎任何关于如何根据我上面解释的方案进一步管理整理sequence_np的想法或链接。

提前致谢,

格雷格

2 个答案:

答案 0 :(得分:0)

我想你的数组名为a;然后尝试

starts=numpy.intersect1d(numpy.where(a<44)[0]+1,numpy.where(a>44)[0])
print(numpy.split(a,starts))

第一行标识[i-1]&lt; 44和a [i]&gt; 44的索引;第二个在这些索引处拆分数组。

答案 1 :(得分:0)

使用标准库:

>>> from itertools import groupby 
>>> groups = [list(g[1]) for g in groupby(a, lambda x: x > 44)]
>>> print groups

[[380.01874, 50.551693], [42.776562, 42.650467, 42.439999, 42.408905, 
42.260563, 42.108276, 42.168007, 42.118984, 42.124863, 42.043224, 
42.095177, 41.915165, 42.06472, 41.938156, 41.893482, 41.941589, 
41.8997, 41.925949, 41.819122, 41.86272, 41.75209, 41.802406, 41.760361], 
[45.260254, 46.825336, 46.500366, 44.3283], [41.763329, 41.780571, 
41.672211, 41.641739, 41.651295, 41.618404, 41.695488, 41.671284, 
41.621643, 41.441597, 41.66975, 41.577599, 41.615841, 41.495602, 
41.561581, 41.560726, 41.6759, 41.537495, 41.401688, 41.618671, 
41.560268, 41.503071, 41.528313], [45.30022, 48.274105], [41.432148, 
41.471291, 41.512859, 41.50565, 41.512371, 41.466312, 41.473175, 
41.515701, 41.500916, 41.439285, 41.492058, 41.4678, 41.40226, 
41.388351, 41.381634, 41.41087, 41.469292, 41.360497, 41.5056, 41.378258, 
41.513493, 41.330299, 41.396099, 41.402149, 41.358822], [44.038559, 
47.995853, 48.233227, 44.078781], [41.41098, 41.243141, 41.356258, 
41.455688, 41.244099, 41.360558, 41.313278, 41.463379, 41.245998, 
41.313683, 41.220814, 41.380154, 41.345943, 41.313438, 41.259727, 
41.361221, 41.238659, 41.397285]]

如果这就是你需要的话,那么直接将这个列表分组成对:

pairs = zip(*[iter(groups)] * 2)

结果将是一个2元组列表,每个元组由两个列表组成(值大于44且值小于44)。