我有一个测试处理服务器的python脚本。完成上载后,将重命名上载的文件,以指示它们已准备好进行处理。为了测试繁重的负载我已经使脚本成为多线程,但是当我在服务器上观察上传目录时,一次只有一个文件进入。以下是相关代码:
def worker(queue):
while queue.qsize():
try:
infile, filename = queue.get()
except Queue.Empty:
return
size = os.path.getsize(infile)
copyStart = time.time()
print '{}: {} started'.format(time.asctime(), filename)
os.system('ssh servername "cat > {0} && mv {0} {1}" < {2}'.format(filename, filename.replace('upl', 'jpg'), infile))
print '{}: {} took {} secs for {} bytes'.format(time.asctime(), filename, time.time() - copyStart, size)
q = Queue.Queue()
for media_type, num in config.get("media").items():
media_dir = media_dir_format.format(media_type)
print '\nLoading media from ' + media_dir
itemId = startId
for i in range(num):
infile = media_dir + random.choice([x for x in os.listdir(media_dir) if x[-3:].lower() == 'jpg'])
filename = output_format.format(itemId, media_type[:-1])
q.put((infile, filename))
itemId += 1
threads = []
for i in range(config.get("threads")):
t = threading.Thread(target=worker, args=(q, ))
t.start()
threads.append(t)
所以基本上我用随机选择的输入文件和格式良好的输出名称之间的映射填充队列然后开始,但是在测试配置文件中指定了许多线程。问题是即使os.system
调用在相似的时间执行,上传也只会一个接一个地发生,从脚本输出中可以看出:
Fri Jul 11 17:06:44 2014: /bla/foo/b1.upl started
Fri Jul 11 17:06:44 2014: /blah/foo/b2.upl started
Fri Jul 11 17:06:44 2014: /blah/foo/b3.upl started
Fri Jul 11 17:07:03 2014: /blah/foo/b1.upl took 19.0852029324 secs for 8947009 bytes
Fri Jul 11 17:07:03 2014: /blah/foo/b4.upl started
Fri Jul 11 17:07:21 2014: /blah/foo/b3.upl took 36.8071010113 secs for 8348547 bytes
Fri Jul 11 17:07:21 2014: /blah/foo/b5.upl started
Fri Jul 11 17:07:40 2014: /blah/foo/b2.upl took 55.855271101 secs for 8348547 bytes
这只是一点点,但你可以看到连续上传的时间越来越长。我发现很难相信这是一个ssh
问题,因为我可以一次性将ssh插入多个shell而没有问题,并且一点点搜索显示了很多人在多个线程中同时使用os.system
的示例那么瓶颈在哪里?
答案 0 :(得分:0)
此问题似乎是os.system
的潜在问题。只要 shell为真,用更新的subprocess.call
替换该用法就会按预期工作:
subprocess.call(
'ssh servername "cat > {0} && mv {0} {1}" < {2}'.format(
filename,
filename.replace('upl', 'jpg'),
infile
),
shell=True
)