Question

比如说我想匹配一个子字符串，如果它包含一定数量的某个字符。但是，我不知道这个角色的确切数量，但我知道这不是负面的。我该怎么写这个正则表达式？

from sys import stdin
import re
k = int(raw_input())
combo = re.compile(r'(?=(.*1.*){k})')
print [ s for s in combo.findall(stdin.readline().strip()) ]

这可能吗？如果是这样，我该怎么做？

编辑：输入示例： k = 2 string = 01010

预期输出：“101”，“0101”，“1010”，“01010”

因此，在每个子字符串中，它恰好包含2个字符“1”

Answer 1

正则表达式是字符串，因此请随意使用您喜欢的字符串格式构造：

combo = re.compile(r'(?=(.*1.*){%d})' % k)

至于你编辑过的问题，我找不到一个简单的方法来用regexp做到这一点，以下怎么样？

def all_substrings(s):
    m = len(s)
    for i in range(m):
        for j in range(i, m):
            yield s[i:j+1]

s = '01010'
print [x for x in all_substrings(s) if x.count('1') == 2]

Answer 2

所以在这么多年后，有人对这个问题投了赞成票。

一开始，我想不起来我在SO上发布问题时第一次看到这个问题的地方。 不，这不是this评论暗示的作业，只是在谷歌中输入几个关键字，我在以下地方找到了问题描述：

https://leetcode.com/problems/count-binary-substrings/
https://www.codechef.com/problems/STRSUB
https://codeforces.com/problemset/problem/165/C（我相信这是我正在研究的具体内容）

我对 codeforces 的看法是正确的。我看到我实际上已经提出了一个解决方案并提交了它。这是我最快的解决方案：https://codeforces.com/contest/165/submission/4171748：

k = int(raw_input())
 
def stable_search( zero, bin_num ):
    import collections
    c_one = ans = temp_ans = temp_z = 0
    c_zero = collections.deque()
    for f in bin_num[zero:]:
        if f == '1':
            c_zero.append(zero); zero = 0
            c_one = -~c_one
            if c_one >= k:
                ans = ans + ( temp_z * temp_ans ) + temp_z
                temp_ans = 0; temp_z = -~c_zero.popleft()
        else: temp_ans, zero = -~temp_ans, -~zero
    return ans + ( temp_z * temp_ans ) + temp_z
 
def mid(bin_num):
    return stable_search(bin_num.find('1'), bin_num)
 
def find_zeros(bin_num):
    import re
    return sum((len(sed)*-~len(sed))>>1 for sed in re.findall( '0+', bin_num))
 
if k == 0: print find_zeros(raw_input())
else: print mid(raw_input())

哎呀！看看那些乱七八糟的（我最近一定学过按位运算）。顺便说一句，-~n 只是将一个加到 n ?。

再次查看代码，我看到正则表达式用于解决问题的一个方面（当 k 为 0 时），但其余部分是使用一种我不确定我现在完全理解的技术完成的。这看起来像是一个 2 指针问题，但我认为可能还有更多问题，尤其是考虑到时间限制。

如您所见，该解决方案在 O(N) 时间内运行并使用 python 2 编写（当时有传言称 python 3 比 python 2 慢，因此每个人都虔诚地坚持使用 python 2 ，包括你真正的）。让我们看看在python 3中重写它是否真的让它变慢了：

https://codeforces.com/contest/165/submission/115388714

不！它变得更快了。

#!/usr/bin/python3
import collections
import re

def find_bin_ksubs (k: int, bin_num: str) -> int:
    tmp_z = tmp_count = count = count_1 = 0
    zeros = collections.deque()
    count_0 = bin_num.find('1')
    if count_0 == -1:
        return 0

    for b in bin_num[count_0:]:
        if b == '1':
            zeros.append(count_0)
            count_0 = 0
            count_1 += 1
            if count_1 >= k:
                count = count + (tmp_z * tmp_count) + tmp_z
                tmp_count = 0
                tmp_z = zeros.popleft() + 1
        else:
            count_0 += 1
            tmp_count += 1

    return count + (tmp_z * tmp_count) + tmp_z


def find_empties (bin_num: str) -> int:
    reg = re.compile(r'0+')
    return sum((count ** 2 + count) >> 1 \
        for zeros in reg.findall(bin_num) if (count := len(zeros)))


if __name__ == '__main__':
    if (k := int (input ())) == 0:
        print (find_empties(input()))
    else:
        print (find_bin_ksubs(k, input()))

编辑

说句公道话，计算机自 2013 年以来一直在发展，所以我决定再上传一次 python2 解决方案，只是为了让比较公平……好吧，谣言似乎仍然是真的：

https://codeforces.com/contest/165/submission/115434939

python使用正则表达式中的变量作为重复量

2 个答案:

编辑