如何计算数组中相邻的重复元素?

时间:2019-01-07 13:53:42

标签: python arrays list numpy counter

我有一个0和1这样的数组

[0,0,1,1,1,0,0,0,0,1,1,0,0]

我想定义一个函数,将这个数组作为输入并输出一个相同长度的数组,其中第一个1出现在索引中相邻1的计数(否则为0)。因此输出将是

[0,0,3,0,0,0,0,0,0,2,0,0,0]

因为1连续3次出现在第二个索引中,而1连续2次出现在第9个索引中。

是否可以使用numpy做到这一点?如果没有,是否有某种(有效的)Python方式可以做到这一点?

8 个答案:

答案 0 :(得分:3)

这是使用纯矢量化操作且没有列表迭代的解决方案:

import numpy as np

data = np.array([0,0,1,1,1,0,0,0,0,1,1,0,0])
output = np.zeros_like(data)

where = np.where(np.diff(data))[0]
vals = where[1::2] - where[::2]
idx = where[::2] + 1

output[idx] = vals
output
# array([0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0])

答案 1 :(得分:2)

使用itertools模块:

from itertools import chain, groupby

A = [0,0,1,1,1,0,0,0,0,1,1,0,0]

def get_lst(x):
    values = list(x[1])
    return [len(values)] + [0]*(len(values) - 1) if x[0] else values

res = list(chain.from_iterable(map(get_lst, groupby(A))))

# [0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]

答案 2 :(得分:1)

You could use groupby to group the consecutive elements:

from itertools import groupby

a = [0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0]


def groups(lst):
    result = []
    for key, group in groupby(lst):
        if not key:  # is a group of zeroes
            result.extend(list(group))
        else:  # is a group of ones
            count = sum(1 for _ in group)
            if count > 1:  # if more than one
                result.append(count)
                result.extend(0 for _ in range(count - 1))
            else:
                result.append(0)
    return result


print(groups(a))

Output

[0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]

A shorter (more pythonic?) is the following:

def groups(lst):
    for key, group in groupby(lst):
        count = sum(1 for _ in group)
        if key and count > 1:
            yield count
        yield from (0 for _ in range(count - key))


print(list(groups(a)))

答案 3 :(得分:1)

Here's one way using numpy and a list comprehension:

In [23]: a = np.array([0,0,1,1,1,0,0,0,0,1,1,0,0])
In [24]: np.hstack([x.sum() if x[0] == 1 else x for x in np.split(a, np.where(np.diff(a) != 0)[0]+1)])
Out[24]: array([0, 0, 3, 0, 0, 0, 0, 2, 0, 0])

The logic:

  1. Find leading and trailing indices of where you have consequence 1s.
  2. Split your array from those indices
  3. sum those sub lists that have one and leave sub lists with zero the way they are
  4. flatten the result using np.hstack.

If you want to replace the remained ones with 0 just do the following:

In [28]: np.hstack([[x.sum(), *[0]*(len(x) -1)]  if x[0] == 1 else x for x in np.split(a, np.where(np.diff(a) != 0)[0]+1)])
Out[28]: array([0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0])

[0]*(len(x) -1) will create the expected 0s for you and using an in-place unpacking you'll be able to place them beside the sum(x).

If you ever wanted a pure Python approach here's one way using itertools.groupby:

In [63]: def summ_cons(li):
    ...:     for k,g in groupby(li) :
    ...:            if k:
    ...:               s = sum(g)
    ...:               yield s
    ...:               yield from (0 for _ in range(s-1))
    ...:            yield from g
    ...:            


In [65]: list(summ_cons(a))
Out[65]: [0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]

答案 4 :(得分:1)

使用熊猫,利用熊猫count来计数非NaN值。使用mask创建NaN,然后​​对s值的更改进行分组。

import pandas as pd
l = [0,0,1,1,1,0,0,0,0,1,1,0,0]
s = pd.Series(l)
g = s.diff().ne(0).cumsum()
s.mask(s==0).groupby(g).transform('count').mask(g.duplicated(), 0).tolist()

输出:

[0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]

答案 5 :(得分:0)

using groupby

from itertools import groupby
a=[0,0,1,1,1,0,0,0,0,1,1,0,0]
lst_of_tuples=[]
for k,v in groupby(a):
    lst_of_tuples.append((k,len(list(v))))
ans=[]
for k,v in lst_of_tuples:
    temp=[v if k==1 else k]
    for i in range(v-1):
        temp.append(0)
    ans=ans+temp

output

[0,0,3,0,0,0,0,0,0,2,0,0,0]

答案 6 :(得分:0)

TL; DR:这将为您提供所需的输出:

import itertools

input = [0,0,1,1,1,0,0,0,0,1,1,0,0]   
result = []

for k, g in itertools.groupby(input):
    if k == 1:
        ll = len(list(g))
        result.extend([ll,] + [0 for _ in range(ll-1)])
    else:
        result.extend(list(g)) 

会给您:

[0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]

解释:

itertools具有groupby函数,用于拆分“相同”的运行

for k, g in itertools.groupby(input):
    print(g, list(k))

会给您:

0 [0, 0]
1 [1, 1, 1]
0 [0, 0, 0, 0]
1 [1, 1]
0 [0, 0]

所以k是键,输入序列中的元素,g是组。

因此if条件在输入中附加一系列(如果是0)(如果是0),或者如果是1加上0的行程以填充1行程的长度,则附加长度。

答案 7 :(得分:0)

没有依赖性的其他选项:

良好的旧while循环访问索引(有时比numpy快):

def count_same_adjacent_non_zeros(iterable):
  i, x, size = 0, 0, len(iterable)
  while i < size-1:
    if iterable[i] != iterable[i+1]:
      tmp = iterable[x:i+1]
      if not iterable[i] == 0:
        tmp = [len(tmp)] + [0 for _ in range(i-x)]
      for e in tmp: yield e
      x = i + 1
    i += 1
  for e in iterable[x:size]: yield e


print(list(count_same_adjacent_non_zeros(array)))

#=> [0, 0, 3, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0]

它也可以与array = [0,0,4,4,4,0,0,0,0,5,5,0,0]

一起使用