Question

这是我的数据

id
123246512378
632746378456
378256364036
159204652855
327445634589

我想制作的数据包含双三个连续数字的数据，例如 123 2465 123 78,3274 456 | 345 89减少了

id
632746378456
378256364036
159204652855

Answer 1

首先，将df.id转换为单个数字整数数组。

a = np.array(list(map(list, map(str, df.id))), dtype=int)

然后检查一个数字是否比下一个数字少一个......两次

first = a[:, :-2] == a[:, 1:-1] - 1
second = a[:, 1:-1] == a[:, 2:] - 1

为我们多次发生这种情况时创建一个掩码

mask = np.count_nonzero(first & second, axis=1) < 2
df[mask]

             id
1  632746378456
2  378256364036
3  159204652855

Answer 2

不确定这是否比@piRSquared更快，因为我不能用pandas来生成我自己的测试数据，但它似乎应该是：

def mask_cons(df):
    a = np.array(list(map(list, df.id.astype(str))), dtype = float) 
    # same as piRSquared, but float
    g_a = np.gradient(a, axis = 1)[:,1:-1] 
    # 3 consecutive values will give grad(a) = +/-1
    mask = (np.abs(g_a) == 1).sum(1) > 1
    # this assumes 4 consecutive values count as 2 instances of 3 consecutive values
    # otherwise more complicated methods are needed (probably @jit)
    return df[mask]

如何在python中消除包含双三连续数字的数据？

2 个答案: