Question

EDITED 如何计算Python中的连续字符，以查看每个唯一数字在下一个唯一数字之前重复的次数？我对这种语言很陌生，所以我正在寻找一些简单的东西。

起初我以为我可以这样做：

word = '1000'

counter=0
print range(len(word))


for i in range(len(word)-1):
    while word[i]==word[i+1]:
        counter +=1
        print counter*"0"
    else:
        counter=1
        print counter*"1"

因此，通过这种方式，我可以看到每个唯一数字重复的次数。但当i达到最后一个值时，这当然会超出范围。

在上面的示例中，我希望Python告诉我1重复1次，并且0重复3次。但是，由于我的while语句，上面的代码失败了。

我知道你可以通过内置函数来实现这一点，并且更喜欢这种解决方案。有人有任何见解吗？

Answer 1

连续计数：

哦，没有人发布itertools.groupby了！

s = "111000222334455555"

from itertools import groupby

groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]

之后，result看起来像：

[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]

您可以使用以下内容进行格式化：

", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"

总计数：

评论中的某个人担心您希望总计数量为"11100111" -> {"1":6, "0":2}。在这种情况下，您想使用collections.Counter：

from collections import Counter

s = "11100111"
result = Counter(s)
# {"1":6, "0":2}

您的方法：

正如许多人所指出的那样，您的方法失败了，因为您正在循环range(len(s))但是正在寻找s[i+1]。当i指向s的最后一个索引时，这会导致一个错误，因此i+1会引发IndexError。解决此问题的一种方法是循环遍历range(len(s)-1)，但生成迭代的东西会更加pythonic。

对于字母不是很大的字符串，zip(s, s[1:])不是性能问题，所以你可以这样做：

counts = []
count = 1
for a, b in zip(s, s[1:]):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

唯一的问题是，如果最后一个角色是唯一的，那么你必须特殊情况。可以使用itertools.zip_longest

修复此问题

import itertools

counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
    if a==b:
        count += 1
    else:
        counts.append((a, count))

如果您确实拥有一个真正的巨大的字符串，并且无法一次将两个字符串保存在内存中，则可以使用itertools recipe pairwise。

def pairwise(iterable):
    """iterates pairwise without holding an extra copy of iterable in memory"""
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

counts = []
count = 1
for a, b in pairwise(s):
    ...

Answer 2

“那种方式”的解决方案，只有基本陈述：

word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
    for i in range(1,len(word)):
       if word[i-1]==word[i]:
          count+=1
       else :
           length += word[i-1]+" repeats "+str(count)+", "
           count=1
    length += ("and "+word[i]+" repeats "+str(count))

否则： I = 0 长度+ =（“和”+ word [i] +“重复”+ str（count））打印（长度）

显示：

'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'

＃'1重复1'

Answer 3

总计（没有子分组）

#!/usr/bin/python3 -B

charseq = 'abbcccdddd'
distros = { c:1 for c in charseq  }

for c in range(len(charseq)-1):
    if charseq[c] == charseq[c+1]:
        distros[charseq[c]] += 1

print(distros)

我将提供有趣线条的简要说明。

distros = { c:1 for c in charseq  }

上面的行是字典理解，它基本上遍历charseq中的字符，并为字典创建一个键/值对，其中键是字符，值是它具有的次数到目前为止遇到过。

然后是循环：

for c in range(len(charseq)-1):

我们从0转到length - 1，以避免在循环体中使用c+1索引超出范围。

if charseq[c] == charseq[c+1]:
    distros[charseq[c]] += 1

此时，我们遇到的每个匹配都是连续的，所以我们只需将1添加到字符键即可。例如，如果我们拍摄一次迭代的快照，代码可能看起来像这样（使用直接值而不是变量，用于说明目的）：

# replacing vars for their values
if charseq[1] == charseq[1+1]:
    distros[charseq[1]] += 1

# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
    distros['b'] += 1

您可以使用正确的计数查看下面的程序输出：

➜  /tmp  ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}

Answer 4

您只需将len(word)更改为len(word) - 1即可。也就是说，您还可以使用False的值为0且True的值为sum时为1的事实：

sum(word[i] == word[i+1] for i in range(len(word)-1))

这会产生(False, True, True, False)的总和，其中False为0且True为1 - 这就是您所追求的。

如果您希望这是安全的，您需要保护空字（索引-1访问）：

sum(word[i] == word[i+1] for i in range(max(0, len(word)-1)))

使用zip：

可以改善这一点

sum(c1 == c2 for c1, c2 in zip(word[:-1], word[1:]))

Answer 5

这是我在python 3中查找binaray字符串中连续1的最大数量的简单代码：

PRAGMA foreign_keys=OFF;    
BEGIN TRANSACTION;

IF NOT EXISTS (SELECT * FROM widgets WHERE name="123")
BEGIN
ALTER TABLE table2 ADD COLUMN mynewcolumn VARCHAR(255);    
INSERT INTO table2 ("abc", "def")
END
COMMIT;

Answer 6

独特的方法： - 如果您只是想要连续计数1 使用Bit Magic：这个想法是基于这样的概念：如果我们和一个带有自身移位版本的位序列，我们就会有效地从连续1的每个序列中删除尾随1。

  11101111   (x)
& 11011110   (x << 1)
----------
  11001110   (x & (x << 1)) 
    ^    ^
    |    |

尾随1已删除因此，操作x =（x＆amp;（x <＆lt; 1））在x的二进制表示中将每个1s序列的长度减少1。如果我们继续在循环中执行此操作，我们最终得到x = 0.达到0所需的迭代次数实际上是最长连续序列1s的长度。

Answer 7

如果我们要连续个字符不循环，我们可以使用pandas：

In [1]: import pandas as pd

In [2]: sample = 'abbcccddddaaaaffaaa'
In [3]: d = pd.Series(list(sample))

In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]

关键是找到与先前值不同的第一个元素，然后在pandas中进行适当的分组：

In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))

In [7]: d.ne(d.shift())
Out[7]:
0     True
1     True
2    False
3     True
dtype: bool

In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0    1
1    2
2    2
3    3
dtype: int32

Answer 8

无需计数或分组。只要注意发生变化的索引并减去连续的索引即可。

w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]

print(dw)  # digits
['1', '0', '2', '3', '4']
print(cw)  # counts
[3, 3, 3, 2, 2, 5]

w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw)  # characters
print(cw)  # digits

['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]

Answer 9

返回没有导入的连续字符数量的单行：

def f(x):s=x+" ";t=[x[1] for x in zip(s[0:],s[1:],s[2:]) if (x[1]==x[0])or(x[1]==x[2])];return {h: t.count(h) for h in set(t)}

返回列表中任何重复字符在连续字符中的次数。

或者，这可以完成同样的事情，但速度要慢得多：

def A(m):t=[thing for x,thing in enumerate(m) if thing in [(m[x+1] if x+1<len(m) else None),(m[x-1] if x-1>0 else None)]];return {h: t.count(h) for h in set(t)}

在性能方面，我运行它们

site = 'https://web.njit.edu/~cm395/theBeeMovieScript/'
s = urllib.request.urlopen(site).read(100_000)
s = str(copy.deepcopy(s))
print(timeit.timeit('A(s)',globals=locals(),number=100))
print(timeit.timeit('f(s)',globals=locals(),number=100))

导致：

12.528256356999918
5.351301653001428

这个方法肯定可以改进，但不使用任何外部库，这是我能想到的最好的方法。

计算连续的字符

9 个答案:

连续计数：

总计数：

您的方法：

总计（没有子分组）