我尝试(不成功)使用多处理并行化循环。 这是我的Python代码:
from MMTK import *
from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator
from MMTK.Proteins import Protein, PeptideChain
import numpy as np
filename = 'traj_prot_nojump.nc'
trajectory = Trajectory(None, filename)
universe = trajectory.universe
proteins = universe.objectList(Protein)
chain = proteins[0][0]
def calpha_2dmap_mult(t = range(0,len(trajectory))):
dist = []
global trajectory
universe = trajectory.universe
proteins = universe.objectList(Protein)
chain = proteins[0][0]
traj = trajectory[t]
dt = 1000 # calculate distance every 1000 steps
for n, step in enumerate(traj):
if n % dt == 0:
universe.setConfiguration(step['configuration'])
for i in np.arange(len(chain)-1):
for j in np.arange(len(chain)-1):
dist.append(universe.distance(chain[i].peptide.C_alpha,
chain[j].peptide.C_alpha))
return(dist)
dist1 = calpha_2dmap_mult(range(1000,2000))
dist2 = calpha_2dmap_mult(range(2000,3000))
# Multiprocessing
from multiprocessing import Pool, cpu_count
pool = Pool(processes=2)
dist_pool = [pool.apply(calpha_2dmap_mult, args=(t,)) for t in [range(1000,2000), range(2000,3000)]]
print(dist_pool[0]==dist1)
print(dist_pool[1]==dist2)
如果我尝试Pool(processes = 1)
,代码按预期工作,但只要我要求多个进程,代码就会崩溃并出现此错误:
python: posixio.c:286: px_pgin: Assertion `*posp == ((off_t)(-1)) || *posp == lseek(nciop->fd, 0, 1)' failed.
如果有人有建议,我们将非常感激; - )
答案 0 :(得分:0)
我怀疑是因为这个:
trajectory = Trajectory(None, filename)
您在开始时只打开一次文件。您可能只需将文件名传递给多处理目标函数,然后在那里打开它。
答案 1 :(得分:0)
如果您在OS X或任何其他类Unix系统上运行此代码,则多处理使用分叉来创建子进程。
分叉时,文件描述符与父进程共享。据我所知,轨迹对象包含对文件描述符的引用。
要解决此问题,您应该放置
轨迹=轨迹(无,文件名)
在calpha_2dmap_mult中,以确保每个子进程单独打开文件。
答案 2 :(得分:0)
以下是允许使用多个进程(但没有性能改进)的新脚本:
from MMTK import *
from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator
from MMTK.Proteins import Protein, PeptideChain
import numpy as np
import time
filename = 'traj_prot_nojump.nc'
trajectory = Trajectory(None, filename)
universe = trajectory.universe
proteins = universe.objectList(Protein)
chain = proteins[0][0]
def calpha_2dmap_mult(trajectory = trajectory, t = range(0,len(trajectory))):
dist = []
universe = trajectory.universe
proteins = universe.objectList(Protein)
chain = proteins[0][0]
traj = trajectory[t]
dt = 1000 # calculate distance every 1000 steps
for n, step in enumerate(traj):
if n % dt == 0:
universe.setConfiguration(step['configuration'])
for i in np.arange(len(chain)-1):
for j in np.arange(len(chain)-1):
dist.append(universe.distance(chain[i].peptide.C_alpha,
chain[j].peptide.C_alpha))
return(dist)
c0 = time.time()
dist1 = calpha_2dmap_mult(trajectory, range(0,11001))
#dist1 = calpha_2dmap_mult(trajectory, range(0,11001))
c1 = time.time() - c0
print(c1)
# Multiprocessing
from multiprocessing import Pool, cpu_count
pool = Pool(processes=4)
c0 = time.time()
dist_pool = [pool.apply(calpha_2dmap_mult, args=(trajectory, t,)) for t in
[range(0,2001), range(3000,5001), range(6000,8001),
range(9000,11001)]]
c1 = time.time() - c0
print(c1)
dist1 = np.array(dist1)
dist_pool = np.array(dist_pool)
dist_pool = dist_pool.flatten()
print(np.all((dist_pool == dist1)))
计算距离所花费的时间是没有(70.1s)或多处理(70.2s)的“相同”!我可能没想到会有4倍的改进,但我至少期待一些改进!
答案 3 :(得分:0)
听起来这可能是通过NFS读取netCDF文件的问题。 NFS存储上是traj_prot_nojump.nc
吗?请参阅this Unidata mailing list post和this post to the IDL newsgroup。后者建议使用解决方法将文件首先复制到本地存储。