Python下一个子串搜索

时间:2013-04-16 21:55:32

标签: python string search substring

我正在多次发送带有前/后同步的消息。我希望能够在两个有效的前/后缀之间提取消息。我的代码是

print(msgfile[msgfile.find(preamble) + len(preamble):msgfile.find(postamble, msgfile.find(preamble))])

问题是如果后同步码损坏,它将在第一个有效前导码和下一个有效后同步码之间打印所有数据。收到的文本文件示例如下:

garbagePREAMBLEmessagePOSTcMBLEgarbage
garbagePRdAMBLEmessagePOSTAMBLEgarbage
garbagePREAMBLEmessagePOSTAMBLEgarbage

它会打印

messagePOSTcMBLEgarbage
garbagePRdEAMBLEmessage

但我真正希望它打印的是来自第三行的消息,因为它具有有效的前/后发送。所以我想我想要的是能够从子串的下一个实例中找到并索引。有一个简单的方法吗?

编辑:我不希望我的数据处于漂亮的离散线条中。我只是将其格式化,以便更容易看到

3 个答案:

答案 0 :(得分:0)

逐行处理:

>>> test = "garbagePREAMBLEmessagePOSTcMBLEgarbage\n"
>>> test += "garbagePRdAMBLEmessagePOSTAMBLEgarbage\n"
>>> test += "garbagePREAMBLEmessagePOSTAMBLEgarbage\n"
>>> for line in test.splitlines():
        if line.find(preamble) != -1 and line.find(postamble) != -1:
            print(line[line.find(preamble) + len(preamble):line.find(postamble)])

答案 1 :(得分:0)

import re

lines = ["garbagePREAMBLEmessagePOSTcMBLEgarbage",
        "garbagePRdAMBLEmessagePOSTAMBLEgarbage",
        "garbagePREAMBLEmessagePOSTAMBLEgarbage"]

# you can use regex
my_regex = re.compile("garbagePREAMBLE(.*?)POSTAMBLEgarbage")

# get the match found between the preambles and print it
for line in lines:
    found = re.match(my_regex,line)
    # if there is a match print it
    if found:
        print(found.group(1))

# you can use string slicing
def validate(pre, post, message):
    for line in lines:
        # method would break on a string smaller than both preambles
        if len(line) < len(pre) + len(post):
            print("error line is too small")

        # see if the message fits the pattern
        if line[:len(pre)] == pre and line[-len(post):] == post:
            # print message
            print(line[len(pre):-len(post)])

validate("garbagePREAMBLE","POSTAMBLEgarbage", lines)

答案 2 :(得分:0)

单行上的所有消息? 然后,您可以使用正则表达式来标识具有有效前置和后置的行:

input_file = open(yourfilename)
import re
pat = re.compile('PREAMBLE(.+)POSTAMBLE')
messages = [pat.search(line).group(1) for line in input_file 
            if pat.search(line)]

print messages