如何在python

时间:2016-10-13 07:48:12

标签: python python-2.7 python-3.x bioinformatics biopython

我正在尝试从python中的pdb输入文件打印预定义的序列,但我没有得到预期的结果。我是python的新手,我也有导入目录,但它不起作用。没有显示任何内容(无法找到错误)。它刚刚运行没有任何输出。

import os

os.chdir('C:\Users\Vishnu\Desktop\Test_folder\Input')


for path, dirs, pdbfile in os.walk('/C:\Users\Vishnu\Desktop\Test_folder\Input'):
for line in pdbfile:
    if line[:6] != "HETATM":
        continue
    chainID = line[21:22]
    atomID = line[13:16].strip()
    if chainID not in ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'):
        continue
    if atomID not in ('C4B', 'O4B', 'C1B', 'C2B', 'C3B'):
        continue
    with open('C:\Users\Vishnu\Desktop\Test_folder\Input', 'r') as fh:
        new = [line.rstrip() for line in fh]
    with open('C:\Users\Vishnu\Desktop\Test_folder\Output', 'w') as fh:
        [fh.write('%s\n' % line) for line in new]
        fh.write((line.rstrip()))

预期产出:

HETATM 3788  C4B NAI A 302      52.695  15.486   8.535  1.00 57.28           C  
HETATM 3789  O4B NAI A 302      52.258  14.631   7.456  1.00 56.26           O  
HETATM 3794  C1B NAI A 302      53.348  13.816   7.022  1.00 53.44           C 
HETATM 3792  C2B NAI A 302      54.537  14.748   7.190  1.00 50.93           C  

HETATM 3789  O4B NAI A 302      52.258  14.631   7.456  1.00 56.26           O  
HETATM 3794  C1B NAI A 302      53.348  13.816   7.022  1.00 53.44           C 
HETATM 3792  C2B NAI A 302      54.537  14.748   7.190  1.00 50.93           C 
HETATM 3790  C3B NAI A 302      54.225  15.525   8.465  1.00 52.99           C  

HETATM 3794  C1B NAI A 302      53.348  13.816   7.022  1.00 53.44           C 
HETATM 3792  C2B NAI A 302      54.537  14.748   7.190  1.00 50.93           C 
HETATM 3790  C3B NAI A 302      54.225  15.525   8.465  1.00 52.99           C  
HETATM 3788  C4B NAI A 302      52.695  15.486   8.535  1.00 57.28           C  
HETATM 3792  C2B NAI A 302      54.537  14.748   7.190  1.00 50.93           C 
HETATM 3790  C3B NAI A 302      54.225  15.525   8.465  1.00 52.99           C  
HETATM 3788  C4B NAI A 302      52.695  15.486   8.535  1.00 57.28           C  
HETATM 3789  O4B NAI A 302      52.258  14.631   7.456  1.00 56.26           O  

HETATM 3790  C3B NAI A 302      54.225  15.525   8.465  1.00 52.99           C  
HETATM 3788  C4B NAI A 302      52.695  15.486   8.535  1.00 57.28           C  
HETATM 3789  O4B NAI A 302      52.258  14.631   7.456  1.00 56.26            O  
HETATM 3794  C1B NAI A 302      53.348  13.816   7.022  1.00 53.44           C 

B链的格式也相同。

如何打印预定义序列? line [21:22]是否有链ID,链ID可能是A到H.如何定义A到H链ID?

我无法按顺序打印,任何人都可以告诉我如何在python中打印预定义的序列吗?

答案后:

我已使用以下代码更新了上述代码:

n = 4
for chain, atoms in d.items():
    for atom, line in atoms.items():
        for i in range(len(atom)-n+1):
            for j in range(n):
                print d[chain][atomIDs[i+j]]
            print

我想延长两个段落,但没有获得预期的输出

1 个答案:

答案 0 :(得分:1)

以下是我的评论全部合并到一个答案:

with open('1AHI.pdb') as pdbfile:
    for line in pdbfile:
        if line[:6] != "HETATM":
            continue
        chainID = line[21:22]
        atomID = line[13:16].strip()
        if chainID not in ('A', 'B'):
            continue
        if atomID not in ('C4B', 'O4B', 'C1B', 'C2B', 'C3B'):
            continue
        ## Either:
        print(line, end='')
        ## Or:
        print(line.rstrip(), end='\n')
        ## Or if Python2.x:
        print line.rstrip()

我的第一行代码是在10多年前解析PDB文件时编写的。不要绝望。你有一个漫长而美好的旅程。

P.S。我认为mmCIF最近比PDB更喜欢...确保你阅读了两种文件格式的规范。

我已经更新了答案,但请注意,此网站用于解决特定问题,而不是其他人为您完成工作。它通常被低估。

d = {}
chainIDs = ('A', 'B',)
atomIDs = ('C4B', 'O4B', 'C1B', 'C2B', 'C3B', 'C4B')
with open('1AHI.pdb') as pdbfile:
    for line in map(str.rstrip, pdbfile):
        if line[:6] != "HETATM":
            continue
        chainID = line[21:22]
        atomID = line[13:16].strip()
        if chainID not in chainIDs:
            continue
        if atomID not in atomIDs:
            continue
        try:
            d[chainID][atomID] = line
        except KeyError:
            d[chainID] = {atomID: line}

n = 4
for chainID in chainIDs:
    for i in range(len(atomIDs)-n+1):
        for j in range(n):
            print d[chainID][atomIDs[i+j]]
        print