Question

我不确定如何表达我的问题。但这是...

我有一个1和0的庞大列表[总长度= 53820]。

列表外观示例- [0,1,1,1,1,1,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1...........]

下面是可视化效果。

x轴：元素的索引（从0到53820）

y轴：该索引处的值（即1或0）

输入图-> （http://i67.tinypic.com/2h5jq5e.png）

该图清楚地显示了3个密集区域，其中1s的出现更多。我已在图的顶部绘制以显示视觉上密集的区域。（情节上难看的黑线）。我想知道绘图上密集区域（开始和结束边界）的x轴上的索引号。

我已经提取了1的块并将每个块的起始索引保存在名为“ starts”的新列表中。该函数返回这样的词典列表：

{'start': 0, 'count': 15, 'end': 16}, {'start': 2138, 'count': 3, 'end': 2142}, {'start': 2142, 'count': 3, 'end': 2146}, {'start': 2461, 'count': 1, 'end': 2463}, {'start': 2479, 'count': 45, 'end': 2525}, {'start': 2540, 'count': 2, 'end': 2543}

然后在设置阈值后开始比较相邻的元素。返回密集区域的明显边界。

THR = 2000
    results = []
    cues = {'start': 0, 'stop': 0}  
    result,starts = densest(preds) # Function that returns the list of dictionaries shown above
    cuestart = False # Flag to check if looking for start or stop of dense boundary
    for i,j in zip(range(0,len(starts)), range(1,len(starts))):
        now = starts[i]
        nextf = starts[j]

        if(nextf-now > THR):
            if(cuestart == False):
                cues['start'] = nextf
                cues['stop'] = nextf
                cuestart = True

            elif(cuestart == True): # Cuestart is already set
                cues['stop'] = now
                cuestart = False
                results.append(cues)
                cues = {'start': 0, 'stop': 0}

    print('\n',results)

输出和相应的图看起来像这样。

[{'start': 2138, 'stop': 6654}, {'start': 23785, 'stop': 31553}, {'start': 38765, 'stop': 38765}]

输出图-> （http://i63.tinypic.com/23hom6o.png）

该方法无法获得图中所示的最后一个密集区域，也无法获取相似种类的其他数据。

P.S。我也曾使用seaborn在此数据上尝试过“ KDE”，并尝试过“ distplot”，但这直接为我提供了图，而我无法从中提取边界值。该问题的链接在此处（Getting dense region boundary values from output of KDE plot）

Answer 1

好的，您需要一个答案...

首先，导入（我们将使用userid, type, amount, datetime user1, deposit, 200, 2019-01-01 00:00:00 user2, deposit, 200, 2019-01-01 00:00:00 user3, deposit, 200, 2019-01-01 00:00:00 user1, deposit, 900, 2019-01-02 01:00:00 (first deposit date that more than 1000, sum here 200+900 = 1100 > 1000, first date calculate from here) user1, withdrawal, 200, 2019-01-03 02:00:00 (balance - 200 = 1000 < 1000, void first deposit deposit date) user1, spend, 100, 2019-02-03 03:00:00 (balance - 200 = 900 < 1000) user1, deposit, 1000, 2019-02-03 (first deposit date more than 1000 start from here again since 900 + 1000 = 1900 > 100) user1, withdraw, 200, 2019-02-05 (still more than 1000, remain first deposit date) user1, deposit, 1000, 2019-05-15 (balance of 1000 more than 3 months (from 2019-02-03, flag user) CREATE TRIGGER TRG_Transaction_InsertUserSuspicious ON [dbo].[tblTransactionLog] AFTER INSERT ON BEGIN SELECT @InsertedUserID = [UserID], FROM inserted IF (User hold balance of more than 1000 within 30 days) BEGIN UPDATE [dbo].[tblUser] SET [IsFlagged] = 1 WHERE [UserID] = @InsertedUserID UPDATE END）

LineCollections

接下来，常量的定义

import numpy as np ; import matplotlib.pyplot as plt ;                           
from matplotlib.collections import LineCollection

以及伪造数据的产生

N = 1001 ; np.random.seed(20190515)

在这里我们创建行集合，x = np.linspace(0,1, 1001) prob = np.where(x<0.4, 0.02, np.where(x<0.7, 0.95, 0.02)) y = np.where(np.random.rand(1001)<prob, 1, 0)是一个sticks数组包含垂直线的起点和终点

N×2×2

最后，累加总和，在这里归一化为具有与垂直线

sticks = np.array(list(zip(zip(x, np.zeros(N)), zip(x, y))))                                  
lc = LineCollection(sticks)

我们只需要绘制结果

cs = (y-0.5).cumsum()                                                            
csmin, csmax = min(cs), max(cs)                                                  
cs = (cs-csmin)/(csmax-csmin) # normalized to 0 ÷ 1

这是情节

，这里是停止区域的详细信息。

您可以平滑f, a = plt.subplots() a.add_collection(lc) a.plot(x, cs, color='red') a.grid() a.autoscale()数据，并使用cs到找出极端的位置。最后这应该有问题吗步骤，请问另一个问题。

在1s和0s的庞大列表中提取1s的密集区域的边界

1 个答案: