Question

我正在尝试在Python中使用多线程，以便浏览大量文本（字符）并计算相同字符的重复次数。

如何进行设置以便我可以将上述方法用于多线程的方法并处理文件来计算字符的重现次数？

Answer 1

这里有三个问题。

您的程序尝试一次读取多个线程中的相同物理文件。这没有意义，可能在操作系统或硬件级别上是不可行的。可行的是首先在主线程中将整个文件读入多个字符串，然后在多个线程中对字符串（而不是文件）进行操作。

第二个问题是您从计数器返回一个值，但返回的值没有位置。没有来电者。没有像＆＃34;返回＆＃34;这样的概念。从一个线程到另一个线程的值。相反，线程必须将值存储在某处，然后主线程中的代码可以访问它。一个快速而肮脏的解决方案是创建一个列表并将每个线程传递给索引。完成后，它会使用索引将结果填入列表中。

第三个问题是你必须等待线程完成。线程有一个.join（）函数用于此目的。只有在完成所有线程后，才能打印所有计数的总和。

这对你来说可能是一个有用的学习练习，但它肯定比在一个帖子中完成整个事情要慢。当您必须等待某个事件发生时，线程很有价值，并且您希望在此期间在其他任务上取得进展。对于简单的数字运算，线程没有优势（有多处理）。

Answer 2

建议不要在单个文件上使用多个线程，而是先将内容保存为字符串，然后将其拆分为相等的部分。

我鼓励你使用一个派生自threading.Thread的助手类，所以每个Thread都有自己的计数器。

代码示例

import threading


class Counter(threading.Thread):
    def __init__(self, text, char):
        super().__init__()
        self.counter = 0
        self.text = text
        self.char = char

    def run(self):
        for c in self.text:
            if c == self.char:
                self.counter += 1


if __name__ == "__main__":
    char = "a"
    number_of_threads = 3
    threads = []
    counter = 0
    # Read your file before using threads to avoid IO errors
    file = open("my_file.txt", "r")
    text = file.read()
    # Split your text into parts, equal to the number of threads
    parts = [text[i:i + number_of_threads] for i in range(0, len(text), number_of_threads)]
    # Create and start a thread for each part
    for part in parts:
        thread = Counter(part, char)
        thread.start()
        threads.append(thread)
    # Join your threads and collect their numbers
    for thread in threads:
        thread.join()
        counter += thread.counter

    print(counter)

多线程可计算文件中字符的重复次数

2 个答案:

代码示例