ATOM 856 CE ALYS A 104 0.809 0.146 26.161 0.54 29.14 C
ATOM 857 CE BLYS A 104 0.984 -0.018 26.394 0.46 31.19 C
ATOM 858 NZ ALYS A 104 1.988 0.923 26.662 0.54 33.17 N
ATOM 859 NZ BLYS A 104 1.708 0.302 27.659 0.46 37.61 N
ATOM 860 OXT LYS A 104 -0.726 -6.025 27.180 1.00 26.53 O
ATOM 862 N LYS B 276 17.010 -16.138 9.618 1.00 41.00 N
ATOM 863 CA LYS B 276 16.764 -16.524 11.005 1.00 31.05 C
ATOM 864 C LYS B 276 16.428 -15.306 11.884 1.00 26.93 C
ATOM 865 O LYS B 276 16.258 -15.447 13.090 1.00 29.67 O
ATOM 866 CB LYS B 276 17.863 -17.347 11.617 1.00 33.62 C
我有上面的文本文件,需要根据第21行的差异制作两个文本文件。我写了一个可以打印所需结果的脚本。但如果我不知道第21栏的角色是什么,我该怎么做才能做到这一点。以下是我试过的脚本。假设我不知道第21行是" A"和" B"或" B"和" G"或任何其他组合,需要在第21行的基础上分开。我该怎么做?
import sys
for fn in sys.argv[1:]:
f=open(fn,'r')
while 1:
line=f.readline()
if not line: break
if line[21:22] == 'B':
chns = line[0:80]
print chns
答案 0 :(得分:1)
使用str.split
并比较第5个元素(即第21个字符)
while 1:
line = f.readline()
if not line:
break
# get character in 5th column
ch = line.split()[4]
if ch == 'B':
chns = line[0:80]
print chns
else: # not sure what the character is
pass # do something
答案 1 :(得分:1)
您可以将值初始化为None并查看其是否更改:
import sys
for fn in sys.argv[1:]:
old = None
f=open(fn,'r')
for line in f:
if not line: break
if (old is None) or (line[21] == old):
old = line[21]
chns = line[0:80]
print chns
答案 2 :(得分:1)
不确定您要实现的目标。但是下面的代码将按字典lines
中的第21个字符对所有文件中的行进行排序。
import sys
lines = dict()
for fn in sys.argv[1:]:
f = open(fn,'r')
for line in f:
if not line:
break
key = line.split()[4]
if key not in lines.keys():
lines[key] = list()
lines[key].append(line)
然后,您可以使用lines.keys()
获取所有发生的第21个字符,并从字典中获取包含所有相应行的列表()。
答案 3 :(得分:1)
从上一行存储第21个字符的上一个值,然后为每个不匹配 添加换行符(这意味着另一个组)相同的字母)根据第21个字符打印分组的行。
请注意,它仅根据文件中的行序列对具有匹配的第21个字符的行进行分组,这意味着未排序的行将具有多个单独的分组同样的第21个字符。
修改后的文件以显示此案例:
ATOM 856 CE ALYS A 104 0.809 0.146 26.161 0.54 29.14 C
ATOM 857 CE BLYS A 104 0.984 -0.018 26.394 0.46 31.19 C
ATOM 862 N LYS B 276 17.010 -16.138 9.618 1.00 41.00 N
ATOM 863 CA LYS B 276 16.764 -16.524 11.005 1.00 31.05 C
ATOM 864 C LYS B 276 16.428 -15.306 11.884 1.00 26.93 C
ATOM 865 O LYS B 276 16.258 -15.447 13.090 1.00 29.67 O
ATOM 866 CB LYS B 276 17.863 -17.347 11.617 1.00 33.62 C
ATOM 858 NZ ALYS A 104 1.988 0.923 26.662 0.54 33.17 N
ATOM 859 NZ BLYS A 104 1.708 0.302 27.659 0.46 37.61 N
ATOM 860 OXT LYS A 104 -0.726 -6.025 27.180 1.00 26.53 O
生成此案例的代码(不对行进行排序):
import sys
for fn in sys.argv[1:]:
with open(fn,'r') as file:
prev = 0
for line in file:
line = line.strip()
if line[21:22] != prev:
# new line separator for each group
print ''
print line
prev = line[21:22]
显示此案例的示例输出:
ATOM 856 CE ALYS A 104 0.809 0.146 26.161 0.54 29.14 C
ATOM 857 CE BLYS A 104 0.984 -0.018 26.394 0.46 31.19 C
ATOM 862 N LYS B 276 17.010 -16.138 9.618 1.00 41.00 N
ATOM 863 CA LYS B 276 16.764 -16.524 11.005 1.00 31.05 C
ATOM 864 C LYS B 276 16.428 -15.306 11.884 1.00 26.93 C
ATOM 865 O LYS B 276 16.258 -15.447 13.090 1.00 29.67 O
ATOM 866 CB LYS B 276 17.863 -17.347 11.617 1.00 33.62 C
ATOM 858 NZ ALYS A 104 1.988 0.923 26.662 0.54 33.17 N
ATOM 859 NZ BLYS A 104 1.708 0.302 27.659 0.46 37.61 N
ATOM 860 OXT LYS A 104 -0.726 -6.025 27.180 1.00 26.53 O
因此,如果您希望每个相同的第21个字符只有一个,请使用{将所有行放在list
和排序中{1}}会这样做。
代码(在分组前先对行进行排序):
list.sort()
输出到:
import sys
for fn in sys.argv[1:]:
with open(fn,'r') as file:
lines = file.readlines()
# creates a list or pairs (21st char, line) within a list
lines = [ [line[21:22], line.strip() ] for line in lines ]
# sorts lines based on key (21st char)
lines.sort()
# brings back list of lines to its original state,
# but the order is not reverted since it is already sorted
lines = [ line[1] for line in lines ]
prev = 0
for line in lines:
if line[21:22] != prev:
# new line separator for each group
print ''
print line
prev = line[21:22]
修改强>
在不同文件中写入分组行实际上不需要检查上一行的值,因为根据第21个字符更改文件名会打开一个新文件,从而分隔行。但是在这里,我使用了ATOM 856 CE ALYS A 104 0.809 0.146 26.161 0.54 29.14 C
ATOM 857 CE BLYS A 104 0.984 -0.018 26.394 0.46 31.19 C
ATOM 858 NZ ALYS A 104 1.988 0.923 26.662 0.54 33.17 N
ATOM 859 NZ BLYS A 104 1.708 0.302 27.659 0.46 37.61 N
ATOM 860 OXT LYS A 104 -0.726 -6.025 27.180 1.00 26.53 O
ATOM 862 N LYS B 276 17.010 -16.138 9.618 1.00 41.00 N
ATOM 863 CA LYS B 276 16.764 -16.524 11.005 1.00 31.05 C
ATOM 864 C LYS B 276 16.428 -15.306 11.884 1.00 26.93 C
ATOM 865 O LYS B 276 16.258 -15.447 13.090 1.00 29.67 O
ATOM 866 CB LYS B 276 17.863 -17.347 11.617 1.00 33.62 C
,这样任何以前创建的具有相同文件名的文件都不会被附加,这可能会导致文件内容混乱或不一致。
prev
如果追加以前创建的文件不是问题,则可以简化文件编写部分。但是,它有可能写入具有相同文件名的文件,该文件不是由脚本创建的,或者是在早期执行/会话期间由脚本创建的。
import sys
for fn in sys.argv[1:]:
with open(fn,'r') as file:
lines = file.readlines()
# creates a list or pairs (21st char, line) within a list
lines = [ [line[21:22], line ] for line in lines ]
# sorts lines based on key (21st char)
lines.sort()
# brings back list of lines to its original state,
# but the order is not reverted since it is already sorted
lines = [ line[1] for line in lines ]
filename = 'file'
prev = 0
for line in lines:
if line[21:22] != prev:
# creates a new file
file = open(filename + line[21:22] + '.txt', 'w')
else:
# appends to the file
file = open(filename + line[21:22] + '.txt', 'a')
file.write(line)
prev = line[21:22]