我有一个字符串输出,如下所示:
Distance AAAB: ,0.13634,0.13700,0.00080,0.00080,-0.00066,.00001,
Distance AAAC: ,0.12617,0.12680,0.00080,0.00080,-0.00063,,
Distance AAAD: ,0.17045,0.16990,0.00080,0.00080,0.00055,,
Distance AAAE: ,0.09330,0.09320,0.00080,0.00080,0.00010,,
Distance AAAF: ,0.21048,0.21100,0.00080,0.00080,-0.00052,,
Distance AAAG: ,0.02518,0.02540,0.00040,0.00040,-0.00022,,
Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,
Distance AAAI: ,0.10811,0.10860,0.00080,0.00070,-0.00049,,
Distance AAAJ: ,0.02430,0.02400,0.00200,0.00200,0.00030,,
Distance AAAK: ,0.09449,0.09400,0.00200,0.00100,0.00049,,
Distance AAAL: ,0.07689,0.07660,0.00050,0.00050,0.00029,
我想要做的是从这个块中提取一组特定的数据,例如只有这样的距离AAAH:
Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,
测量将始终以距离AAA *开始:星号是唯一会改变的角色。
并发症: 这需要是通用的,因为我有很多不同的数据集,因此距离AAAI可能并不总是遵循距离AAAI或者距离AAAG,因为不同项目的测量值会有所不同。我也不能依赖.len(),因为最后的测量有时可能是空白的(因为它与距离AAAH一样)或者可以填充(与距离AAAB一样。我不认为我可以使用.find(),因为我需要在距离AAAH之后的所有数字。
我还很新,我尽力找到类似这个问题的解决方案,但运气不好。
答案 0 :(得分:1)
您可以使用re
模块。制作一个功能应该很方便。
import re
def SearchDistance(pattern,text):
pattern = pattern.replace(' ','\s')
print re.findall(r'{0}.+'.format(pattern),a)
SearchDistance('Distance AAAH',a)
输出:
['Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,']
答案 1 :(得分:1)
您可以通过此脚本搜索文本:
#fullText = YOUR STRING
text = fullText.splitlines()
for line in text:
if line.startswith('Distance AAAH:'):
print line
输出:Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,