如何从输入目录文件夹输入并保存与python中输出文件夹中输入文件同名的输出文件

时间:2016-10-15 15:10:54

标签: python python-2.7 python-3.x bioinformatics

我想为我的代码创建输入目录,从输入目录中获取输入文件,并在输出(不同文件夹)目录中保存与输入文件相同的名称。

脚本:

import sys
import glob
import errno
import os


d = {}
chainIDs = ('A', 'B')
atomIDs = ('C4B', 'O4B', 'C1B', 'C2B', 'C3B', 'C4B', 'O4B', 'C1B')
count = 0
for doc in os.listdir('/C:/Users/Vishnu/Desktop/Test_folder/Input'):
doc1 = "doc_path" + doc
doc2 = "/C:/Users/Vishnu/Desktop/Test_folder/Output" + doc1
if doc1.endswith(".pdb"):
with open(doc) as pdbfile:
       single_line = ''.join([line for line in f])
       single_space = ' '.join(single_line.split())
       for line in map(str.rstrip, pdbfile):
            if line[:6] != "HETATM":
                continue
            chainID = line[21:22]
            atomID = line[13:16].strip()
            if chainID not in chainIDs:
                continue
            if atomID not in atomIDs:
                continue
            try:
                d[chainID][atomID] = line
            except KeyError:
                d[chainID] = {atomID: line}

    n = 4
    for chainID in chainIDs:
        for i in range(len(atomIDs)-n+1):
            for j in range(n):
                   with open(doc2.format(count) , "w") as doc2:
                         doc2.write(d[chainID][atomIDs[i+j]])
                         count += 1   

else:
continue

运行上面代码时出现错误,我是python中的新手,只是学习,有人可以帮忙吗? 错误:

with open(doc) as pdbfile:
    ^
IndentationError: expected an indented block
>>> 

输入文件:

HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C  
HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  
HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  
HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C  
HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  
HETATM15252  C4B NAD B 501      36.455 115.053  36.671  1.00 11.25           C  
HETATM15253  O4B NAD B 501      35.930 114.469  35.492  1.00 11.25           O  
HETATM15254  C3B NAD B 501      35.307 115.837  37.367  1.00 11.25           C  
HETATM15256  C2B NAD B 501      34.172 114.876  37.039  1.00 11.25           C  
HETATM15258  C1B NAD B 501      34.524 114.613  35.551  1.00 11.25           C  
HETATM15297  C4B NAD C 501      98.229 130.106  18.332  1.00 12.28           C  
HETATM15298  O4B NAD C 501      98.083 131.545  18.199  1.00 12.28           O  
HETATM15299  C3B NAD C 501      99.346 129.675  17.343  1.00 12.28           C  
HETATM15301  C2B NAD C 501     100.220 130.922  17.375  1.00 12.28           C  
HETATM15303  C1B NAD C 501      99.125 132.008  17.317  1.00 12.28           C  
HETATM15342  C4B NAD D 501      77.335 156.939  25.788  1.00 11.99           C  
HETATM15343  O4B NAD D 501      78.705 156.544  25.901  1.00 11.99           O  
HETATM15344  C3B NAD D 501      77.106 158.059  26.824  1.00 11.99           C  
HETATM15346  C2B NAD D 501      78.536 158.632  26.878  1.00 11.99           C  
HETATM15348  C1B NAD D 501      79.351 157.345  26.900  1.00 11.99           C  

第2栏是残留名称,第4栏是A,B,C,D是链ID:

每个链ID的预期输出(A,B ..... Z)链ID可能是A到Z,但主要是A到H:

对于A链:

HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C 
HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  
HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  
HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C   

HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  
HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  
HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C  
HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  

HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  
HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C  
HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  
HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C  

HETATM15211  C2B NAD A 501      46.447  98.835   6.988  1.00 11.48           C  
HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  
HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C  
HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  

HETATM15209  C3B NAD A 501      47.659  99.689   6.567  1.00 11.48           C  
HETATM15207  C4B NAD A 501      47.266 101.038   7.214  1.00 11.48           C  
HETATM15208  O4B NAD A 501      46.466 100.713   8.371  1.00 11.48           O  
HETATM15213  C1B NAD A 501      46.221  99.300   8.426  1.00 11.48           C  

2 个答案:

答案 0 :(得分:0)

IndentationError正在显示,因为您似乎在with open(doc) as pdbfile:行下面缩进了两个标签。

希望这有帮助!

答案 1 :(得分:0)

import sys
import glob
import errno
import os


d = {}
chainIDs = ('A', 'B')
atomIDs = ('C4B', 'O4B', 'C1B', 'C2B', 'C3B', 'C4B', 'O4B', 'C1B')
count = 0
doc_path=r'C:\Users\Vishnu\Desktop\Test_folder\Input'
tar_path=r'C:\Users\Vishnu\Desktop\Test_folder\Output'
for doc in os.listdir(doc_path):
    doc1 = doc_path+'\\'+ doc
    doc2 = tar_path+'\\'+ doc

    if doc1.endswith(".pdb"):
        print(doc1,doc2)
        with open(doc1) as pdbfile:
           # single_line = ''.join([line for line in f])
           # single_space = ' '.join(single_line.split())
           for line in map(str.rstrip, pdbfile):
                if line[:6] != "HETATM":
                    continue
                chainID = line[21:22]
                atomID = line[13:16].strip()
                if chainID not in chainIDs:
                    continue
                if atomID not in atomIDs:
                    continue
                try:
                    d[chainID][atomID] = line
                except KeyError:
                    d[chainID] = {atomID: line}
           n = 4
           for chainID in chainIDs:
               for i in range(len(atomIDs)-n+1):
                   for j in range(n):
                          with open(doc2 , "w+") as s:
                                s.write(d[chainID][atomIDs[i+j]])
                                count += 1   

    else:
        continue