Question

我尝试解析行数据，然后将它们分组到列表中。

这是我的剧本：

from pyparsing import *

data = """START
line 2
line 3
line 4
END
START
line a
line b
line c
END
"""

EOL = LineEnd().suppress()
start = Keyword('START').suppress() + EOL
end = Keyword('END').suppress() + EOL

line = SkipTo(LineEnd()) + EOL
lines = start + OneOrMore(start | end | Group(line))

start.setDebug()
end.setDebug()
line.setDebug()

result = lines.parseString(data)
results_list = result.asList()

print(results_list)

此代码的灵感来自另一个stackoverflow问题： Matching nonempty lines with pyparsing

我需要的是逐行解析从START到END的所有内容，并将其保存到每个组的列表中（从START到匹配END的所有内容都是一组）。但是，此脚本将每一行都放在新组中。

结果如下：

[['line 2'], ['line 3'], ['line 4'], ['line a'], ['line b'], ['line c'], ['']]

我希望它是：

[['line 2', 'line 3', 'line 4'], ['line a', 'line b', 'line c']]

它还在最后解析一个空字符串。

我是一个麻坡初学者，所以我请求你的帮助。

由于

Answer 1

您可以使用nestedExpr查找START和END分隔的文字。

如果您使用

In [322]: pp.nestedExpr('START', 'END').searchString(data).asList()
Out[322]: 
[[['line', '2', 'line', '3', 'line', '4']],
 [['line', 'a', 'line', 'b', 'line', 'c']]]

然后将文本拆分为空格。（请注意，我们需要'line', '2'代替'line 2'。我们只是在'\n'上进行拆分。因此，要解决此问题，我们可以使用pp.nestedExpr函数的content参数，该参数允许我们控制嵌套列表中被视为项目的内容。 nestedExpr的源代码定义了

content = (Combine(OneOrMore(~ignoreExpr + 
                ~Literal(opener) + ~Literal(closer) +
                CharsNotIn(ParserElement.DEFAULT_WHITE_CHARS,exact=1))
            ).setParseAction(lambda t:t[0].strip()))

默认情况下，pp.ParserElement.DEFAULT_WHITE_CHARS为

In [324]: pp.ParserElement.DEFAULT_WHITE_CHARS
Out[324]: ' \n\t\r'

这是导致nextExpr在所有空格上拆分的原因。因此，如果我们将其简化为'\n'，那么nestedExpr会将内容拆分为而不是所有的空格。

import pyparsing as pp

data = """START
line 2
line 3
line 4
END
START
line a
line b
line c
END
"""

opener = 'START'
closer = 'END'
content = pp.Combine(pp.OneOrMore(~pp.Literal(opener) 
                                  + ~pp.Literal(closer) 
                                  + pp.CharsNotIn('\n',exact=1)))
expr = pp.nestedExpr(opener, closer, content=content)

result = [item[0] for item in expr.searchString(data).asList()]
print(result)

产量

[['line 2', 'line 3', 'line 4'], ['line a', 'line b', 'line c']]

将行数据解析为具有pyparsing的关键字

1 个答案: