Question

我有python代码，它接受一堆任务并将它们分发到群集上的不同线程或不同节点。我总是写一个主脚本driver.py，它带有两个命令行参数：--run-all和--run-task。第一个只是一个包装器，遍历所有任务，然后调用driver.py --run-task，每个任务作为参数传递。例如：

== driver.py ==
# Determine the current script
DRIVER = os.path.abspath(__file__)
(opts, args) = parser.parse_args()
if opts.run_all is not None:
  # Run all tasks
   for task in opts.run_all.split(","):
     # Call driver.py again with a specific task
     cmd = "python %s --run-task %s" %(DRIVER, task)
     # Execute on system
     distribute_cmd(cmd)
elif opts.run_task is not None:
  # Run on an individual task
  # code here for processing a task...

然后用户将致电：

$ driver.py --run-all task1,task2,task3,task4

每项任务都会分发。

函数distribute_cmd采用shell可执行命令，并以特定于系统的方式发送到节点或线程。 driver.py必须找到自己的名称并调用自身的原因是因为distribute_cmd需要可执行的shell命令;例如，它不能使用函数名称。

这种考虑促使我采用这种设计，一种具有两种模式且必须自行调用的驱动程序脚本。这有两个复杂因素：（1）脚本必须通过__file__找到自己的路径，并且（2）在将它变成Python包时，不清楚driver.py应该去哪里。它应该是一个可执行的脚本，但如果我把它放在setup.py的{{1}}中，那么我将不得不找出脚本所在的位置（参见correct way to find scripts directory from setup.py in Python distutils?）。这似乎不是一个好的解决方案。

这是另一种替代设计？请记住，任务的分配必须导致可执行命令，该命令可以作为字符串传递给scripts=。感谢。

Answer 1

您正在寻找的是一个已经完全满足您需求的库，例如Fabric或Celery。
如果您不使用节点，我建议您使用multiprocessing。
这是与this one

为了能够远程执行，您需要：

ssh访问该框，在这种情况下，您可以使用Fabric发送命令。
服务器，SocketServer，tcp服务器或任何可以接受连接的内容。
等待数据的代理或客户，如果您正在使用代理，您也可以使用代理来处理您的邮件。 Celery允许您在队列上执行一些管道，一端puts消息，而从队列中执行另一端gets消息。如果消息是要执行的命令，则代理可以进行os.system()呼叫，或致电subprocess.Popen()

芹菜示例：

 import os
 from celery import Celery
 celery = Celery('tasks', broker='amqp://guest@localhost//')
 @celery.task
 def run_command(command):
    return os.system(command)

然后，您将需要一个绑定队列并等待执行任务的worker。 More info in the documentation

结构示例：

代码：

from fabric.api import run
def exec_remotely(command):
   run(command)

调用：

$ fab exec_remotely:command='ls -lh'

More info in the documentation

批处理系统案例 回到问题......

distribute_cmd可以调用bsub somescript.sh
您只需查找文件，因为您要使用其他参数重新执行相同的脚本
由于上述原因，您可能无法提供正确的distutils脚本。

让我们质疑这个设计。

为什么需要使用相同的脚本？
您的驱动程序可以编写脚本然后调用bsub吗？
您可以使用临时文件吗？
所有节点是否实际共享文件系统？
您如何知道节点上是否存在文件？

示例：

TASK_CODE = {
   'TASK1': '''#!/usr/bin/env python
#... actual code for task1 goes here ...
''',
   'TASK2': '''#!/usr/bin/env python
#... actual code for task2 goes here ...
'''}
# driver portion
(opts, args) = parser.parse_args()
if opts.run_all is not None:
   for task in opts.run_all.split(","):
      task_path = '/tmp/taskfile_%s' % task
      with open(task_path, 'w') as task_file:
         task_file.write(TASK_CODE[task])
      # note: should probably do better error handling.
      distribute_cmd(task_path)

如何构建将作业分配到Python中的线程/节点的代码？

1 个答案: