Question

我正在尝试将许多字符串正则表达式匹配到一个长字符串中，并在每次匹配时计算分隔符。使用multiprocessing同时搜索多个正则表达式：

with open('many_regex', 'r') as f:
    sch = f.readlines()

with open('big_string', 'r') as f:
    text = f.read()

import re
def search_sch(sch,text = text):
    delim_index = []
    last_found = 0
    for match in re.finditer(sch, text):
        count_delims = len(re.findall('##', text[last_found:match.start()]))
        if delim_index:
            count_delims += delim_index[-1]
        delim_index.append(count_delims)
        last_found = match.end()
    return delim_index

from multiprocessing.dummy import Pool

with Pool(8) as threadpool:
    matches = threadpool.map(search_sch, sch[:100])

执行threadpool.map大约需要25秒，并且只使用一个CPU核心。知道为什么没有使用更多内核吗？还有，任何python库都可以快速完成吗？

Answer 1

来自Pool的{{1}}类使用线程而不是多处理。这意味着全局解释器锁是一个问题。您想要使用实际的多处理;为此，替换

multiprocessing.dummy

的

from multiprocessing.dummy import Pool

快速，多个正则表达式字符串与python匹配

1 个答案: