上传然后通过ssh重命名多个文件

时间:2014-07-11 21:24:18

标签: python multithreading ssh os.system

我有一个测试处理服务器的python脚本。完成上载后,将重命名上载的文件,以指示它们已准备好进行处理。为了测试繁重的负载我已经使脚本成为多线程,但是当我在服务器上观察上传目录时,一次只有一个文件进入。以下是相关代码:

def worker(queue):
    while queue.qsize():
        try:
            infile, filename = queue.get()
        except Queue.Empty:
            return

        size = os.path.getsize(infile)

        copyStart = time.time()

        print '{}: {} started'.format(time.asctime(), filename)

        os.system('ssh servername "cat > {0} && mv {0} {1}" < {2}'.format(filename, filename.replace('upl', 'jpg'), infile))

        print '{}: {} took {} secs for {} bytes'.format(time.asctime(), filename, time.time() - copyStart, size)

q = Queue.Queue()

for media_type, num in config.get("media").items():
    media_dir = media_dir_format.format(media_type)
    print '\nLoading media from ' + media_dir
    itemId = startId

    for i in range(num):
        infile = media_dir + random.choice([x for x in os.listdir(media_dir) if x[-3:].lower() == 'jpg'])

        filename = output_format.format(itemId, media_type[:-1])

        q.put((infile, filename))


        itemId += 1

threads = []

for i in range(config.get("threads")):
    t = threading.Thread(target=worker, args=(q, ))
    t.start()
    threads.append(t)

所以基本上我用随机选择的输入文件和格式良好的输出名称之间的映射填充队列然后开始,但是在测试配置文件中指定了许多线程。问题是即使os.system调用在相似的时间执行,上传也只会一个接一个地发生,从脚本输出中可以看出:

Fri Jul 11 17:06:44 2014: /bla/foo/b1.upl started
Fri Jul 11 17:06:44 2014: /blah/foo/b2.upl started
Fri Jul 11 17:06:44 2014: /blah/foo/b3.upl started
Fri Jul 11 17:07:03 2014: /blah/foo/b1.upl took 19.0852029324 secs for 8947009 bytes
Fri Jul 11 17:07:03 2014: /blah/foo/b4.upl started
Fri Jul 11 17:07:21 2014: /blah/foo/b3.upl took 36.8071010113 secs for 8348547 bytes
Fri Jul 11 17:07:21 2014: /blah/foo/b5.upl started
Fri Jul 11 17:07:40 2014: /blah/foo/b2.upl took 55.855271101 secs for 8348547 bytes

这只是一点点,但你可以看到连续上传的时间越来越长。我发现很难相信这是一个ssh问题,因为我可以一次性将ssh插入多个shell而没有问题,并且一点点搜索显示了很多人在多个线程中同时使用os.system的示例那么瓶颈在哪里?

1 个答案:

答案 0 :(得分:0)

此问题似乎是os.system的潜在问题。只要 shell为真,用更新的subprocess.call替换该用法就会按预期工作:

subprocess.call(
    'ssh servername "cat > {0} && mv {0} {1}" < {2}'.format(
        filename, 
        filename.replace('upl', 'jpg'),
        infile
    ),
    shell=True
)