Question

我想解析一个内容为的多行文本文件

    section1:
  key1 val1
  key2  val2

section2:
  val1
  val2
val3

section3:

section4:
 somevalue

各节的标题（section1，section2，...）已定义。目的是在不同部分的下读取值。我在多行上使用pyparsing模块时遇到了麻烦（真正的问题比这个简单的示例要复杂得多）。

当我使用以下代码时，解析器期望每行已定义关键字的完整列表：

# -*- coding: utf-8 -*-

from pyparsing import Literal, ZeroOrMore, LineEnd, ParseException

FileSyntax = None

def Grammar():

    #section1:
    section1 = Literal("section1:").suppress() + ZeroOrMore(LineEnd())
    #section2:
    section2 = Literal("section2:").suppress() + ZeroOrMore(LineEnd())
    #section3:
    section3 = Literal("section3:").suppress() + ZeroOrMore(LineEnd())
    #section4:
    section4 = Literal("section4:").suppress() + ZeroOrMore(LineEnd())

    return section1 + section2 + section3 + section4


def parseFile(filename : str):

    global FileSyntax

    print("\nparse results:\n")

    try:

        TestFile = open(filename)
        testdata = "".join( TestFile.readlines())
        FileSyntax = Grammar() 
        FileSyntax.parseString(testdata)

    except ParseException as err:

        print(err.line)
        print(" "*(err.column-1) + "^")
        print("* " + str(err))       

    except Exception as e:
        import traceback
        traceback.print_exc(e)

parseFile("testdata.txt")

如何进行有状态解析（取决于不同的部分）？谢谢。

Answer 1

如果打印出语法表达式本身，则会得到类似的内容：

{{{{Suppress:("section1:") [LineEnd]...} {Suppress:("section2:") [LineEnd]...}} {Suppress:("section3:") [LineEnd]...}} {Suppress:("section4:") [LineEnd]...}}

也就是说，您正在解析所有节头，而不是节的主体。因此，您很可能在“ section1：”之后的第一行失败。

此外，也无需调用readlines（）然后将所有内容重新组合在一起。只需致电TestFile.read()。甚至更好，pathlib.Path(test_file_name).read_text()

用pyparsing解析多行文本

1 个答案: