如何找到列表元素超过特定条件的持续时间?

时间:2019-09-21 09:28:11

标签: python python-3.x numpy

我正在为我的荣誉项目分析EEG数据。具体来说,我正在分析EEG爆发(即超过标准的计数数量)。每秒记录一次EEG值。在我的情况下,EEG爆发的标准是每次EEG值超过5分钟平均值的15%。但是,我还想找到持续时间保持在平均值的15%以上。我该怎么办?

编辑:这是5分钟块的数据,区域2 [0] [0] [0]

array([2.6749268, 2.3965783, 2.2648365, 2.2901094, 2.5218956, 2.6369736,
   2.3637865, 2.1217346, 1.895203 , 2.055559 , 1.9365673, 2.4218671,
   2.530769 , 2.385391 , 2.293663 , 2.3126507, 2.323733 , 2.2733889,
   2.3903291, 2.6176193, 2.6430926, 2.58586  , 2.352392 , 2.4955454,
   2.6099124, 2.3200274, 2.1760976, 2.5159674, 2.76305  , 2.3733828,
   2.4342089, 2.4008656, 2.2075768, 2.232682 , 2.2406263, 2.4858663,
   2.4188566, 2.5680597, 2.5303915, 2.3958497, 2.2115357, 2.444274 ,
   2.5103524, 2.0567694, 2.441487 , 2.430129 , 2.4614134, 2.282298 ,
   2.4610975, 2.5782802, 2.3088896, 2.660237 , 2.8228939, 2.386515 ,
   1.9969627, 2.0703123, 2.891341 , 2.929259 , 2.3676789, 2.39686  ,
   2.559953 , 2.4817688, 2.4235504, 2.2657301, 2.6064477, 2.6751654,
   2.5263813, 2.1663566, 2.2710345, 2.6688013, 2.1095626, 2.560567 ,
   2.6420567, 2.3834925, 2.4658787, 2.6067703, 2.5786612, 2.6147954,
   2.5842502, 2.5785747, 2.427758 , 2.2909386, 2.2653525, 2.382083 ,
   2.5664327, 2.5153337, 1.820536 , 2.5582454, 2.3047743, 2.0991004,
   2.4578576, 2.7292717, 2.6083386, 2.4281838, 2.453028 , 2.4099083,
   2.3806388, 2.3578563, 2.590041 , 2.621177 , 2.3468106, 2.145658 ,
   2.077852 , 2.2439861, 2.6040363, 2.7262418, 2.5456822, 2.4032714,
   2.4305286, 2.3440735, 2.4467494, 2.8309298, 2.484087 , 2.4205194,
   2.9501045, 2.9746544, 2.4083674, 2.3036501, 2.6792996, 2.3589804,
   2.3387434, 3.1610718, 3.1351097, 2.5165584, 2.52014  , 2.4717925,
   2.3442857, 2.4484215, 2.6329467, 2.5656624, 2.1032746, 2.6107414,
   2.6543136, 2.3989596, 2.491326 , 2.448652 , 2.0739408, 2.1546881,
   2.2125206, 2.3453302, 2.206572 , 2.5481203, 2.4757648, 2.321009 ,
   2.151885 , 2.5293174, 2.4324925, 2.5090148, 2.511013 , 2.3891613,
   2.410615 , 2.4933898, 2.5637872, 2.5406888, 2.3262205, 2.3038528,
   2.3525562, 2.5020993, 2.418803 , 2.3449333, 2.3671083, 2.2996266,
   2.3794844, 2.567588 , 2.4661932, 2.2262957, 2.23916  , 2.4768639,
   2.5996208, 2.0967448, 2.6293585, 2.63859  , 2.1346848, 2.396465 ,
   2.5590088, 2.7100194, 2.5844605, 2.509511 , 2.3888776, 2.3450603,
   2.3059156, 2.3003197, 2.265796 , 2.409108 , 2.3471978, 2.3690042,
   2.3681839, 2.1804225, 2.3097868, 2.5214322, 2.5647783, 2.2263277,
   2.237076 , 2.1725235, 2.7022493, 2.6051672, 2.3920796, 2.5295384,
   2.3592722, 2.3246412, 2.6447232, 2.471837 , 2.7084064, 2.891163 ,
   2.853586 , 2.5431767, 2.626647 , 2.384906 , 2.4660795, 2.3987703,
   1.952864 , 2.200941 , 2.0904007, 2.3755703, 2.4843922, 2.2047417,
   2.356966 , 2.3752437, 2.3846717, 2.306039 , 2.1720207, 2.3954203,
   2.3085623, 2.3531506, 1.9799676, 2.211963 , 2.141896 , 2.23061  ,
   2.3369045, 2.0759099, 2.3769715, 2.2083194, 2.3833442, 2.519347 ,
   2.3158543, 2.349166 , 2.4172142, 2.4422796, 2.513461 , 2.3867416,
   2.2908096, 2.5697057, 2.5763984, 2.319024 , 2.2902663, 2.6102533,
   2.6331408, 2.0932658, 2.323995 , 2.5237315, 2.5517514, 2.2664344,
   2.6006377, 2.3638246, 2.1815908, 2.440507 , 2.319951 , 2.399297 ,
   2.4986413, 2.3624146, 2.2377589, 2.277511 , 2.2512653, 2.1235788,
   2.1558921, 2.0374463, 2.2526782, 2.3825428, 2.2467673, 2.2891142,
   2.2968547, 2.2971904, 2.3470707, 2.2792215, 2.4101918, 2.4403389,
   2.674443 , 2.7500827, 2.4176645, 2.4565153, 2.5339708, 2.545448 ,
   2.3907104, 2.4448986, 2.571172 , 2.3371966, 2.5683126, 2.6280603,
   2.4276655, 2.4700544, 2.421201 , 2.499506 , 2.555047 , 2.7138271,
   2.546179 , 2.484039 , 2.3956237, 2.591265 , 2.6552162, 2.6930716],
  dtype=float32)

1.15 * np.mean(regions2[0][0][0])
Out[17]: 2.783561062812805

np.where(regions2[0][0][0] > 1.15 * np.mean(regions2[0][0][0]))
Out[13]: (array([ 52,  56,  57, 111, 114, 115, 121, 122, 203, 204], dtype=int64),)

edit:查看大于均值1.15的值的索引号,我意识到这表明了我正在寻找的持续时间。即如果是单个数字52,则为1秒;如果为56和57(2个连续的索引号),则为2秒。如何制作一个脚本来计算每个5分钟块的平均突发持续时间?在此示例中,其平均值为(1s + 2s + 1s + 2s + 2s + 2s)

我附上了一个有效的代码段,该代码段可以通过遍历ndarray并比较两个连续的EEG值(如果前一个值较小,则增加1个突发计数(温度+ = 1))来计算突发数量大于平均值的1.15,而后者的值大于平均值的1.15)。这是一个5分钟的区块,我将重复18个5分钟的区块。

burst = 0
for l in range(1,len(regions[i][j][k])): #regions[i][j][k] refers to a 5-min block of EEG data
    if regions[i][j][k][l-1] < (1.15 * np.nanmean(regions[i][j][k])) and regions2[i][j][k][l] > (1.15 * np.nanmean(regions2[i][j][k])):
        burst += 1 #counts a burst

我以为我可以找到爆发发生点的索引号和爆发结束的值的索引号(EEG值降至平均值的1.15以下)。持续时间(以秒为单位)将是两个索引值之间的差。

但是,我不知道如何执行此操作,或者这是否是最佳方法。这些爆发多次发生,所以我的最终目标是每5分钟查找一次这些爆发的平均值。几个月前,我才开始学习python,因此不胜感激!

1 个答案:

答案 0 :(得分:1)

您可以使用numpy获取这些区域。我利用另一个answerquestion的一部分来做到这一点。

具有索引位置和它们之间的点数以及控制台/绘图屏幕截图的结果:

resultimage

执行此操作的代码(您未提供任何数据,我使用正弦波作为基础):

import numpy as np
import datetime
import matplotlib.pyplot as plot

base = datetime.datetime(2019, 1, 1,8,0,0)
tarr = np.array([base + datetime.timedelta(seconds=i) for i in range(60*60)]) #  1 h

time = np.arange(0, 6*6, 0.01) *0.2;
amp = np.sin(time)

# get all indexes that interest me - yours would have the mean here
# not used in rest of code: contains all indexes that are above 0.75
indx = np.nonzero(abs(amp) > 0.75)
# looks like: (array([ 425,  426,  427, ..., 3597, 3598, 3599], dtype=int64),)


def contiguous_regions(cond):
    """Credits: https://stackoverflow.com/a/4495197/7505395"""
    """Finds contiguous True regions of the boolean array "condition". Returns
    a 2D array where the first column is the start index of the region and the
    second column is the end index."""

    # Find the indicies of changes in "condition"
    d = np.diff(cond)
    idx, = d.nonzero() 

    # We need to start things after the change in "condition". Therefore, 
    # we'll shift the index by 1 to the right.
    idx += 1

    if condition[0]:
        # If the start of condition is True prepend a 0
        idx = np.r_[0, idx]

    if condition[-1]:
        # If the end of condition is True, append the length of the array
        idx = np.r_[idx, cond.size] 

    # Reshape the result into two columns
    idx.shape = (-1,2)
    return idx


condition = np.abs(amp) > 0.75

# create a plot visualization
fig, ax = plot.subplots()
ax.plot(tarr, amp)

# print values that fullfill our condition - also do some plotting
for start, stop in contiguous_regions(condition):
    segment = amp[start:stop]
    print (start, stop, f"Amount: {stop-start}")

    # plot some lines into the graph, off by 1 for stops in graph
    ax.vlines(x=tarr[start], ymin=amp[start]+(-0.1 if amp[start]>0 else 0.1),
                             ymax=1.0 if amp[start]>0 else -1.0, color='r')
    ax.vlines(x=tarr[stop-1], ymin=amp[start]+(-0.1 if amp[start]>0 else 0.1), 
                              ymax=1.0 if amp[start]>0 else -1.0, color='r')
    ax.hlines(y=amp[start]-0.08,xmin=tarr[start],xmax=tarr[stop-1])

# show plot    
plot.show()

技巧是将ndp.diff应用于其中具有[False,True,...]的数组来满足条件。然后从np.nonzero()结果元组中获取索引。