在文本文件中,如何使用python解析特定模式中的多线?

时间:2017-03-23 10:23:18

标签: python parsing

我过去曾问过类似的问题,但我不擅长,所以我会再次问你。

以下是示例textfile.txt

    dummy01234567890
    0987654321dummy 
    -------start-------(It is possible to modify)
    text line1
    text line2
    -------end---------(It is possible to modify)
    12345678910
    qwertyuiop        
    -------start-------(It is possible to modify)
    text line3
    text line4
    -------end---------(It is possible to modify)
    ;p12309809128309123
    dummyline1235567

我想解析

"文本行1 \ n文本行2" →array [0]

"文本行3 \ n文本行4" →array [1]

我应该如何在python中编码?

我应该两次使用拆分功能吗?

2 个答案:

答案 0 :(得分:0)

Finite-state machine适应性强,足以满足大多数需求。

state = 'init'
arrays = []
with open('textfile.txt') as f:
    lines = []
    for line in f.readlines():
        if state == 'init':  # seek for start
             word = line.strip().strip('-')
             if word != 'start':
                 continue
             state = 'start'
             lines = []
        elif state == 'start':  # start parsing now
             word = line.strip().strip('-')
             if word != 'end':
                 lines.append(line.strip())
                 continue
             # end current parsing now
             arrays.append('\n'.join(lines))
             state = 'init'

答案 1 :(得分:0)

你可以做这样的事情来达到预期的效果:

#admin.py
class SomeModelAdmin(admin.ModelAdmin):
    form = SomeModelForm
    search_fields = []
    def get_search_results(self, request, queryset, search_term):
        new_queryset, use_distinct = super(SomeModelAdmin, self).\
        get_search_results(request, queryset, search_term)
        new_queryset |= 
            queryset.filter(SomeOtherModel__name__icontains=search_term)
        return new_queryset, use_distinct

这将导致:

text = """dummy01234567890
    0987654321dummy 
    -------start-------(It is possible to modify)
    text line1
    text line2
    -------end---------(It is possible to modify)
    12345678910
    qwertyuiop        
    -------start-------(It is possible to modify)
    text line3
    text line4
    -------end---------(It is possible to modify)
    ;p12309809128309123
    dummyline1235567"""

text_list = text.splitlines()
print(['\n'.join([text_list[3+i*6].strip(), text_list[4+i*6].strip()]) for i in xrange(len(text_list)/6)])