我有一个要从中检索特定信息的文件。
首先,我提取了感兴趣的行并将它们放在列表中:
array = []
file_in = open("Traj.pdb", "r")
for line in file_in:
if line.startswith('TITLE'):
array.append(line)
我最终得到一个这样的列表:
['INSTLE中的TITLE蛋白!膜 上页> POPC:POPE:CHOL = 31.0:41.0:28.0 LowerLeaflet> POPC:POPE:CHOL = 31.0:41.0:28.0 t = 1500000.00000 \ n','TITLE INSANE中的蛋白质!膜上页> POPC:POPE:CHOL = 31.0:41.0:28.0 LowerLeaflet> POPC:POPE:CHOL = 31.0:41.0:28.0 t = 1500020.00000 \ n','TITLE INSANE中的蛋白质!膜上页> POPC:POPE:CHOL = 31.0:41.0:28.0 LowerLeaflet> POPC:POPE:CHOL = 31.0:41.0:28.0 t = 1500040.00000 \ n']
我想提取“ t =”信息(t = 1500000.00000,t = 1500020.00000,t = 1500040.00000 ...等),但我不知道该怎么做。如您所见,列表中的元素是句子,在这种情况下,我对如何检索特定信息有些困惑。非常感谢您的帮助或建议。
答案 0 :(得分:0)
尝试一下:
import re
array_with_elements_i_want = []
for elem in array:
array_with_elements_i_want.append(re.search(r"t= (.)*", elem).group(0))
甚至更好:
import re
array_with_elements_i_want = []
for elem in array:
v = re.search(r"t= (.)*", elem)
if v is not None:
array_with_elements_i_want.append(v.group(0))
答案 1 :(得分:0)
在t=
上拆分您的列表,它可以使您左右两半。您只需要右边的一半,即t=
之后的内容。另外,请确保删除右半部分末尾的换行符\n
。
l = ['TITLE Protein in INSANE! Membrane UpperLeaflet>POPC:POPE:CHOL=31.0:41.0:28.0 LowerLeaflet>POPC:POPE:CHOL=31.0:41.0:28.0 t= 1500000.00000\n', 'TITLE Protein in INSANE! Membrane UpperLeaflet>POPC:POPE:CHOL=31.0:41.0:28.0 LowerLeaflet>POPC:POPE:CHOL=31.0:41.0:28.0 t= 1500020.00000\n', 'TITLE Protein in INSANE! Membrane UpperLeaflet>POPC:POPE:CHOL=31.0:41.0:28.0 LowerLeaflet>POPC:POPE:CHOL=31.0:41.0:28.0 t= 1500040.00000\n']
result = []
for line in l:
splitted_arr = line.split("t= ")
# Consider only the right half
splitted_arr = splitted_arr[1]
# Remove the last character i.e \n
splitted_arr = splitted_arr[:len(splitted_arr)-1]
result.append(splitted_arr)
print(result)