分辨python多处理操作系统进程的简便方法

时间:2016-02-06 03:12:30

标签: python linux python-multiprocessing

摘要

我想使用Python多处理模块在Linux服务器上并行运行多个作业。此外,我希望能够使用toppskill其中一个来查看正在运行的流程,但让其他流程运行。

但是,我所看到的是,从Python多处理模块启动的每个进程看起来都与ps -f命令相同。

我所看到的就是:

fermion:workspace ross$ ps -f
  UID   PID  PPID   C STIME   TTY           TIME CMD
  501 32257 32256   0  8:52PM ttys000    0:00.04 -bash
  501 32333 32257   0  9:05PM ttys000    0:00.04 python ./parallel_jobs.py
  501 32334 32333   0  9:05PM ttys000    0:00.00 python ./parallel_jobs.py
  501 32335 32333   0  9:05PM ttys000    0:00.00 python ./parallel_jobs.py
  501 32336 32333   0  9:05PM ttys000    0:00.00 python ./parallel_jobs.py
  501 32272 32271   0  8:53PM ttys001    0:00.05 -bash

有没有办法在CMD专栏中获得更具描述性的内容?我是否需要在日志文件中跟踪PID?或者还有其他选择吗?

背景

我正在进行一些批处理,其中一些作业可以运行数小时。我需要能够并行运行其中一些工作以节省时间。并且所有这些并行作业需要成功完成才能运行另一项依赖于它们的工作。然而,如果一个工作行为不端,我希望能够杀死它,同时让其他工作完成...这就是我有一份工作,然后是并行工作,然后再按顺序完成一些工作,然后是一些更平行的工作...

示例代码

这是一些虚拟代码,概述了我试图做的事情。

#!/usr/bin/env python
import time
import multiprocessing

def open_zoo_cages():
    print('Opening zoo cages...')

def crossing_road(animal, sleep_time):
    print('An ' + animal + ' is crossing the road')
    for i in range(5):
        print("It's a wide road for " + animal + " to cross...")
        time.sleep(sleep_time)

    print('The ' + animal + ' is across.')

def aardvark():
    crossing_road('aardvark', 2)

def badger():
    crossing_road('badger', 4)

def cougar():
    crossing_road('cougar', 3)

def clean_the_road():
    print('Cleaning off the road of animal droppings...')

def print_exit_code(process):
    print(process.name + " exit code: " + str(process.exitcode))

def main():
    # Run a single job that must finish before running some jobs in parallel
    open_zoo_cages()

    # Run some jobs in parallel
    amos = multiprocessing.Process(name='aardvark Amos', target=aardvark)
    betty = multiprocessing.Process(name='badger Betty', target=badger)
    carl = multiprocessing.Process(name='cougar Carl', target=cougar)

    amos.start()
    betty.start()
    carl.start()

    amos.join()
    betty.join()
    carl.join()

    print_exit_code(amos)
    print_exit_code(betty)
    print_exit_code(carl)

    # Run another job (clean_the_road) if all the parallel jobs finished in 
    # success. Otherwise end in error.
    if amos.exitcode == 0 and betty.exitcode == 0 and carl.exitcode == 0:
        clean_the_road()
    else:
        sys.exit('Not all animals finished crossing')

if __name__ == '__main__':
    main()

另外,我注意到将其中一个函数放在另一个Python模块中并不会改变相关进程的ps命令列中的内容。

输出

fermion:workspace ross$ ./parallel_jobs.py 
Opening zoo cages...
An aardvark is crossing the road
It's a wide road for aardvark to cross...
An badger is crossing the road
It's a wide road for badger to cross...
An cougar is crossing the road
It's a wide road for cougar to cross...
It's a wide road for aardvark to cross...
It's a wide road for cougar to cross...
It's a wide road for aardvark to cross...
It's a wide road for badger to cross...
It's a wide road for cougar to cross...
It's a wide road for aardvark to cross...
It's a wide road for badger to cross...
It's a wide road for aardvark to cross...
It's a wide road for cougar to cross...
The aardvark is across.
It's a wide road for badger to cross...
It's a wide road for cougar to cross...
The cougar is across.
It's a wide road for badger to cross...
The badger is across.
aardvark Amos exit code: 0
badger Betty exit code: 0
cougar Carl exit code: 0
Cleaning off the road of animal droppings...

2 个答案:

答案 0 :(得分:1)

很简单的答案,让每个进程打开一个描述性文件句柄,然后使用lsof。

f = open('/tmp/hippo.txt','w')

这将为您提供流程的pid

lsof | grep "hippo"

这不是最狡猾的答案,但那是什么:)

我最初的答案是简单的方法,这里是一个不完整的小概念,更大的概念,将类信号处理程序添加到被称为子进程的类,允许你发出类似kill -6的东西来转出信息....你甚至可以使用它来按需转储在给定子流程中剩余多少进度的进度,

import signal

class Foo():
    def __init__(self, name):
        self.myname = name
        signal.signal(signal.SIGTERM, self.my_callback)
        self.myqueue = Queue.Queue()

    def my_callback(self):
        logging.error("%s %s %s", self.myname, psutil.blah_getmypid(), len(self.myqueue))         

或者你可以做到这一点,我认为这可能是你真正想要的:

import multiprocessing
import time
def foo():
    time.sleep(60)
if __name__ == "__main__":
    process = [
        multiprocessing.Process(name="a",target=foo),
        multiprocessing.Process(name="b",target=foo),
        multiprocessing.Process(name="c",target=foo),
    ]
    for p in process:
        p.start()
    for p in process:
        print(p.name, p.pid)
    for p in process:
        p.join()

答案 1 :(得分:1)

Psutil库可以满足您的需求,并且被广泛使用。您可以了解psutil程序员如何做到这一点,或者在项目中自己使用库。

https://pypi.python.org/pypi/psutil