在python中提取字符串的一部分有限制

时间:2016-08-24 19:59:42

标签: python python-3.x

我有一个字符串输出,如下所示:

Distance AAAB: ,0.13634,0.13700,0.00080,0.00080,-0.00066,.00001,
Distance AAAC: ,0.12617,0.12680,0.00080,0.00080,-0.00063,,
Distance AAAD: ,0.17045,0.16990,0.00080,0.00080,0.00055,,
Distance AAAE: ,0.09330,0.09320,0.00080,0.00080,0.00010,,
Distance AAAF: ,0.21048,0.21100,0.00080,0.00080,-0.00052,,
Distance AAAG: ,0.02518,0.02540,0.00040,0.00040,-0.00022,,
Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,
Distance AAAI: ,0.10811,0.10860,0.00080,0.00070,-0.00049,,
Distance AAAJ: ,0.02430,0.02400,0.00200,0.00200,0.00030,,
Distance AAAK: ,0.09449,0.09400,0.00200,0.00100,0.00049,,
Distance AAAL: ,0.07689,0.07660,0.00050,0.00050,0.00029,

我想要做的是从这个块中提取一组特定的数据,例如只有这样的距离AAAH:

Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,

测量将始终以距离AAA *开始:星号是唯一会改变的角色。

并发症: 这需要是通用的,因为我有很多不同的数据集,因此距离AAAI可能并不总是遵循距离AAAI或者距离AA​​AG,因为不同项目的测量值会有所不同。我也不能依赖.len(),因为最后的测量有时可能是空白的(因为它与距离AAAH一样)或者可以填充(与距离AAAB一样。我不认为我可以使用.find(),因为我需要在距离AAAH之后的所有数字。

我还很新,我尽力找到类似这个问题的解决方案,但运气不好。

2 个答案:

答案 0 :(得分:1)

您可以使用re模块。制作一个功能应该很方便。

import re
def SearchDistance(pattern,text):
    pattern = pattern.replace(' ','\s')
    print re.findall(r'{0}.+'.format(pattern),a)

SearchDistance('Distance AAAH',a)

输出:

['Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,']

答案 1 :(得分:1)

您可以通过此脚本搜索文本:

#fullText = YOUR STRING
text = fullText.splitlines()
for line in text:
    if line.startswith('Distance AAAH:'):
        print line

输出:Distance AAAH: ,0.11404,0.11450,0.00120,0.00110,-0.00046,,