在python中分发进程

时间:2019-07-03 06:25:40

标签: python multithreading multiprocessing

我有一个python脚本,可加载机器学习模型并对文本文件进行分类。我的脚本看起来像这样:

<div class="row last new-product">
    <div class="col-lg-4">
        <label for="product">Product</label>
        @if($products)
            <select class="form-control kt-select2 products" id="kt_select2_2" name="products[]">
                <option selected disabled>Select a product</option>
                @foreach($products as $product)
                    <option value="{{ $product->id }}" data-price="{{ $product->selling_price }}">{{ $product->name }}</option>
                @endforeach
            </select>
        @endif
    </div>

    <div class="col-lg-4">
        <div class="form-group">
            <label for="product_code">Product Code</label>
            <input type="text" class="form-control" id="product_code" name="product_code[]" placeholder="Enter product code" value="{{ old('product_code') }}">
        </div>
    </div>

    <div class="col-lg-4">
        <div class="form-group">
            <label for="quantity">Product Quantity</label>
            <input type="number" class="form-control" id="quantity" name="quantity[]" placeholder="Enter product quantity" value="{{ old('quantity') }}">
        </div>
    </div>

    <div class="col-lg-4">
        <div class="form-group">
            <label for="price">Product Price</label>
            <input type="text" class="form-control" id="price" name="price[]" placeholder="Enter product price" value="{{ old('price') }}">
        </div>
    </div>

    <div class="col-lg-4">
        <div class="form-group">
            <label for="discount">Product Discount</label>
            <div class="input-group">
                <div class="input-group-prepend"><span class="input-group-text"><i class="fal fa-percentage"></i></span></div>
                <input type="text" id="discount" name="discount[]" class="form-control" placeholder="Enter product discount">
            </div>
        </div>
    </div>

    <div class="col-lg-4">
        <div class="form-group">
            <label for="actions">Actions</label>
            <div class="input-group actions d-flex">
                <span id="add-product"><i class="fal fa-plus"></i> Add product</span>
            </div>
        </div>
    </div>
</div>

我正在使用python命令在10000个文件上运行脚本

import sys
for test_file in sys.argv[1:]:
    classify(test_file)

my_dir包含10000个要分类的文本文件。文件的处理是独立的,我想知道是否可以使用线程分发该过程。一种解决方案是将文件分发到单独的文件夹中,然后分别运行命令,这似乎不是最佳解决方案。

2 个答案:

答案 0 :(得分:1)

一个简单的游泳池可以吗?在这种情况下,这取决于您是否最好使用进程或线程。我的猜测是过程,这是Python通常会遇到的情况。

from multiprocessing import Pool
# from multiprocessing.pool import ThreadPool as Pool    
import sys


def classify(filename):
    print("classified ", filename)


if __name__ == '__main__':
    p = Pool()
    p.map(classify, sys.argv[1:])
    p.close()
    p.join()

使用import语句中的任一个在进程和线程之间进行选择。这两个池的接口完全相同。

答案 1 :(得分:0)

我建议您简单地定义多个线程(每个处理器内核1个)并在该线程中平均分配文件。

from threading import Thread

class Distribute(Thread): 
    def __init__(self, files):
        Thread.__init__(self)
        self.files = files
    def run(self):
        for file in self.files:
            classify(file)

numberOfFile = len(sys.argv[1:])
numberOfThread = 4
numberOfFileByThread = numberOfFile // numberOfThread
threads = [Distribute(sys.argv[1:][i*numberOfFileByThread:(i+1)*numberOfFileByThread ]) for i in range(numberOfThread)]
for thread in threads:
    thread.start()
print("All thread running")
for thread in threads:
    thread.join()
print("processing completed")