在另一列的间隔内找到最高和最低值位置的Numpy版本?

时间:2017-04-04 21:48:18

标签: python performance numpy scipy

给出以下numpy数组。如何使用import numpy as np data = np.array([ [1879.289,np.nan],[1879.281,np.nan],[1879.292,1],[1879.295,1],[1879.481,1],[1879.294,1],[1879.268,1], [1879.293,1],[1879.277,1],[1879.285,1],[1879.464,1],[1879.475,1],[1879.971,1],[1879.779,1], [1879.986,1],[1880.791,1],[1880.29,1],[1879.253,np.nan],[1878.268,np.nan],[1875.73,1],[1876.792,1], [1875.977,1],[1876.408,1],[1877.159,1],[1877.187,1],[1883.164,1],[1883.171,1],[1883.495,1], [1883.962,1],[1885.158,1],[1885.974,1],[1886.479,np.nan],[1885.969,np.nan],[1884.693,1],[1884.977,1], [1884.967,1],[1884.691,1],[1886.171,1],[1886.166,np.nan],[1884.476,np.nan],[1884.66,1],[1882.962,1], [1881.496,1],[1871.163,1],[1874.985,1],[1874.979,1],[1871.173,np.nan],[1871.973,np.nan],[1871.682,np.nan], [1872.476,np.nan],[1882.361,1],[1880.869,1],[1882.165,1],[1881.857,1],[1880.375,1],[1880.66,1], [1880.891,1],[1880.377,1],[1881.663,1],[1881.66,1],[1877.888,1],[1875.69,1],[1875.161,1], [1876.697,np.nan],[1876.671,np.nan],[1879.666,np.nan],[1877.182,np.nan],[1878.898,1],[1878.668,1],[1878.871,1], [1878.882,1],[1879.173,1],[1878.887,1],[1878.68,1],[1878.872,1],[1878.677,1],[1877.877,1], [1877.669,1],[1877.69,1],[1877.684,1],[1877.68,1],[1877.885,1],[1877.863,1],[1877.674,1], [1877.676,1],[1877.687,1],[1878.367,1],[1878.179,1],[1877.696,1],[1877.665,1],[1877.667,np.nan], [1878.678,np.nan],[1878.661,1],[1878.171,1],[1877.371,1],[1877.359,1],[1878.381,1],[1875.185,1], [1875.367,np.nan],[1865.492,np.nan],[1865.495,1],[1866.995,1],[1866.672,1],[1867.465,1],[1867.663,1], [1867.186,1],[1867.687,1],[1867.459,1],[1867.168,1],[1869.689,1],[1869.693,1],[1871.676,1], [1873.174,1],[1873.691,np.nan],[1873.685,np.nan] ]) 在第1列的间隔内找到第0列的最高和最低值位置?

+-------+----------+-----------+---------+
| index |  Value   | Intervals | Min/Max |
+-------+----------+-----------+---------+
|     0 | 1879.289 | np.nan    |         |
|     1 | 1879.281 | np.nan    |         |
|     2 | 1879.292 | 1         |         |
|     3 | 1879.295 | 1         |         |
|     4 | 1879.481 | 1         |         |
|     5 | 1879.294 | 1         |         |
|     6 | 1879.268 | 1         | -1      | min
|     7 | 1879.293 | 1         |         |
|     8 | 1879.277 | 1         |         |
|     9 | 1879.285 | 1         |         |
|    10 | 1879.464 | 1         |         |
|    11 | 1879.475 | 1         |         |
|    12 | 1879.971 | 1         |         |
|    13 | 1879.779 | 1         |         |
|    17 | 1879.986 | 1         |         |
|    18 | 1880.791 | 1         |  1      | max
|    19 |  1880.29 | 1         |         |
|    55 | 1879.253 | np.nan    |         |
|    56 | 1878.268 | np.nan    |         |
|    57 |  1875.73 | 1         | -1      |min
|    58 | 1876.792 | 1         |         |
|    59 | 1875.977 | 1         |         | 
|    60 | 1876.408 | 1         |         |
|    61 | 1877.159 | 1         |         |
|    62 | 1877.187 | 1         |         |
|    63 | 1883.164 | 1         |         |
|    64 | 1883.171 | 1         |         |
|    65 | 1883.495 | 1         |         |
|    66 | 1883.962 | 1         |         |
|    67 | 1885.158 | 1         |         |
|    68 | 1885.974 | 1         |  1      | max
|    69 | 1886.479 | np.nan    |         |
|    70 | 1885.969 | np.nan    |         |
|    71 | 1884.693 | 1         |         |
|    72 | 1884.977 | 1         |         |
|    73 | 1884.967 | 1         |         |
|    74 | 1884.691 | 1         | -1      | min
|    75 | 1886.171 | 1         |  1      | max
|    76 | 1886.166 | np.nan    |         |
|    77 | 1884.476 | np.nan    |         |
|    78 |  1884.66 | 1         |  1      | max
|    79 | 1882.962 | 1         |         |
|    80 | 1881.496 | 1         |         |
|    81 | 1871.163 | 1         | -1      | min
|    82 | 1874.985 | 1         |         |
|    83 | 1874.979 | 1         |         |
|    84 | 1871.173 | np.nan    |         |
|    85 | 1871.973 | np.nan    |         |
|    86 | 1871.682 | np.nan    |         |
|    87 | 1872.476 | np.nan    |         |
|    88 | 1882.361 | 1         |  1      | max
|    89 | 1880.869 | 1         |         |
|    90 | 1882.165 | 1         |         |
|    91 | 1881.857 | 1         |         |
|    92 | 1880.375 | 1         |         |
|    93 |  1880.66 | 1         |         |
|    94 | 1880.891 | 1         |         |
|    95 | 1880.377 | 1         |         |
|    96 | 1881.663 | 1         |         |
|    97 |  1881.66 | 1         |         |
|    98 | 1877.888 | 1         |         |
|    99 |  1875.69 | 1         |         |
|   100 | 1875.161 | 1         | -1      | min
|   101 | 1876.697 | np.nan    |         |
|   102 | 1876.671 | np.nan    |         |
|   103 | 1879.666 | np.nan    |         |
|   111 | 1877.182 | np.nan    |         |
|   112 | 1878.898 | 1         |         |
|   113 | 1878.668 | 1         |         |
|   114 | 1878.871 | 1         |         |
|   115 | 1878.882 | 1         |         |
|   116 | 1879.173 | 1         |  1      | max
|   117 | 1878.887 | 1         |         |
|   118 |  1878.68 | 1         |         |
|   119 | 1878.872 | 1         |         |
|   120 | 1878.677 | 1         |         |
|   121 | 1877.877 | 1         |         |
|   122 | 1877.669 | 1         |         |
|   123 |  1877.69 | 1         |         |
|   124 | 1877.684 | 1         |         |
|   125 |  1877.68 | 1         |         |
|   126 | 1877.885 | 1         |         |
|   127 | 1877.863 | 1         |         |
|   128 | 1877.674 | 1         |         |
|   129 | 1877.676 | 1         |         |
|   130 | 1877.687 | 1         |         |
|   131 | 1878.367 | 1         |         |
|   132 | 1878.179 | 1         |         |
|   133 | 1877.696 | 1         |         |
|   134 | 1877.665 | 1         | -1      | min
|   135 | 1877.667 | np.nan    |         |
|   136 | 1878.678 | np.nan    |         |
|   137 | 1878.661 | 1         |  1      | max
|   138 | 1878.171 | 1         |         |
|   139 | 1877.371 | 1         |         |
|   140 | 1877.359 | 1         |         |
|   141 | 1878.381 | 1         |         |
|   142 | 1875.185 | 1         | -1      | min
|   143 | 1875.367 | np.nan    |         |
|   144 | 1865.492 | np.nan    |         |
|   145 | 1865.495 | 1         |  -1     | min
|   146 | 1866.995 | 1         |         |
|   147 | 1866.672 | 1         |         |
|   148 | 1867.465 | 1         |         |
|   149 | 1867.663 | 1         |         |
|   150 | 1867.186 | 1         |         |
|   151 | 1867.687 | 1         |         |
|   152 | 1867.459 | 1         |         |
|   153 | 1867.168 | 1         |         |
|   154 | 1869.689 | 1         |         |
|   155 | 1869.693 | 1         |         |
|   156 | 1871.676 | 1         |         |
|   157 | 1873.174 | 1         | 1       | max
|   158 | 1873.691 | np.nan    |         |
|   159 | 1873.685 | np.nan    |         |
+-------+----------+-----------+---------+

在下面的第三列中,您可以看到每个区间的最大值和最小值。

pandas

我必须事先指定此问题已通过numpy解决方案得到here的回答。对于大约100万行的表,该解决方案在大约300秒时执行合理。但经过一些更多测试后,我发现如果表格超过300万行,执行时间会急剧增加到超过2500秒甚至更多。对于这么简单的任务来说,这显然太长了。如何用{{1}}来解决同样的问题?

2 个答案:

答案 0 :(得分:1)

这是一种NumPy方法 -

mask = ~np.isnan(data[:,1])

s0 = np.flatnonzero(mask[1:] > mask[:-1])+1
s1 = np.flatnonzero(mask[1:] < mask[:-1])+1
lens = s1 - s0

tags = np.repeat(np.arange(len(lens)), lens)
idx  = np.lexsort((data[mask,0], tags))

starts = np.r_[0,lens.cumsum()]

offsets = np.r_[s0[0], s0[1:] - s1[:-1]]
offsets_cumsum = offsets.cumsum()

min_ids = idx[starts[:-1]] + offsets_cumsum
max_ids = idx[starts[1:]-1] + offsets_cumsum

out = np.full(data.shape[0], np.nan)
out[min_ids] = -1
out[max_ids] = 1

答案 1 :(得分:1)

因为它使用scipy

,所以这有点作弊
import numpy as np
from scipy import ndimage

markers = np.isnan(data[:, 1])
groups = np.cumsum(markers)

mins, max, min_idx, max_idx = ndimage.measurements.extrema(
    data[:, 0], labels=groups, index=range(2, groups.max(), 2))