我有一个python脚本,可加载机器学习模型并对文本文件进行分类。我的脚本看起来像这样:
<div class="row last new-product">
<div class="col-lg-4">
<label for="product">Product</label>
@if($products)
<select class="form-control kt-select2 products" id="kt_select2_2" name="products[]">
<option selected disabled>Select a product</option>
@foreach($products as $product)
<option value="{{ $product->id }}" data-price="{{ $product->selling_price }}">{{ $product->name }}</option>
@endforeach
</select>
@endif
</div>
<div class="col-lg-4">
<div class="form-group">
<label for="product_code">Product Code</label>
<input type="text" class="form-control" id="product_code" name="product_code[]" placeholder="Enter product code" value="{{ old('product_code') }}">
</div>
</div>
<div class="col-lg-4">
<div class="form-group">
<label for="quantity">Product Quantity</label>
<input type="number" class="form-control" id="quantity" name="quantity[]" placeholder="Enter product quantity" value="{{ old('quantity') }}">
</div>
</div>
<div class="col-lg-4">
<div class="form-group">
<label for="price">Product Price</label>
<input type="text" class="form-control" id="price" name="price[]" placeholder="Enter product price" value="{{ old('price') }}">
</div>
</div>
<div class="col-lg-4">
<div class="form-group">
<label for="discount">Product Discount</label>
<div class="input-group">
<div class="input-group-prepend"><span class="input-group-text"><i class="fal fa-percentage"></i></span></div>
<input type="text" id="discount" name="discount[]" class="form-control" placeholder="Enter product discount">
</div>
</div>
</div>
<div class="col-lg-4">
<div class="form-group">
<label for="actions">Actions</label>
<div class="input-group actions d-flex">
<span id="add-product"><i class="fal fa-plus"></i> Add product</span>
</div>
</div>
</div>
</div>
我正在使用python命令在10000个文件上运行脚本
import sys
for test_file in sys.argv[1:]:
classify(test_file)
my_dir包含10000个要分类的文本文件。文件的处理是独立的,我想知道是否可以使用线程分发该过程。一种解决方案是将文件分发到单独的文件夹中,然后分别运行命令,这似乎不是最佳解决方案。
答案 0 :(得分:1)
一个简单的游泳池可以吗?在这种情况下,这取决于您是否最好使用进程或线程。我的猜测是过程,这是Python通常会遇到的情况。
from multiprocessing import Pool
# from multiprocessing.pool import ThreadPool as Pool
import sys
def classify(filename):
print("classified ", filename)
if __name__ == '__main__':
p = Pool()
p.map(classify, sys.argv[1:])
p.close()
p.join()
使用import语句中的任一个在进程和线程之间进行选择。这两个池的接口完全相同。
答案 1 :(得分:0)
我建议您简单地定义多个线程(每个处理器内核1个)并在该线程中平均分配文件。
from threading import Thread
class Distribute(Thread):
def __init__(self, files):
Thread.__init__(self)
self.files = files
def run(self):
for file in self.files:
classify(file)
numberOfFile = len(sys.argv[1:])
numberOfThread = 4
numberOfFileByThread = numberOfFile // numberOfThread
threads = [Distribute(sys.argv[1:][i*numberOfFileByThread:(i+1)*numberOfFileByThread ]) for i in range(numberOfThread)]
for thread in threads:
thread.start()
print("All thread running")
for thread in threads:
thread.join()
print("processing completed")