Question

我在理解如何构建pyparsing解析器时遇到了概念上的困难。步骤是：1）通过组合ParserElement的子类来构建解析器，以及2）使用解析器来解析字符串。

以下示例正常工作：

from pyparsing import Word, Literal, alphas, alphanums, delimitedList, QuotedString

name = Word(alphas+"_", alphanums+"_")
field = name
fieldlist = delimitedList(field)
doc = Literal('<Begin>') + fieldlist + Literal('**End**')

dstring = '<Begin>abc,de34,f_o_o**End**'
print(doc.parseString(dstring))

产生预期的令牌序列：

['<Begin>', 'abc', 'de34', 'f_o_o', '**End**']

但是（例如），类QuotedString不将ParserElement作为参数，因此它不能用于构建解析器。我希望在上面的例子中使用它，如：

name = Word(alphas+"_", alphanums+"_")
field = QuotedString(name)     ### Wrong: doesn't allow "name" as an argument
fieldlist = delimitedList(field)

解析表单文档：

dstring = '<Begin>"abc", "de34", "f_o_o"**End**'

但是由于它不能以这种方式使用，在构造引用字符串列表的解析器时包含QuotedString的正确语法是什么？

========编辑============

见下面的答案......

Answer 1

QuotedString不能用于此任务。但OR函数可以实现相同的效果 - 允许不同形式的引号，同时保留解析引号中包含的字符串有效性的能力。以下代码执行此操作：

from pyparsing import Word, Literal, alphas, alphanums, delimitedList
from pyparsing import Group, QuotedString, ParseException, Suppress

name = Word(alphas+"_", alphanums+"_")
field = Suppress('"') + name + Suppress('"') ^ \    # double quote
        Suppress("'") + name + Suppress("'") ^ \    # single quote
        Suppress("<") + name + Suppress(">") ^ \    # html tag
        Suppress("{{")+ name + Suppress("}}")       # django template variable
fieldlist = Group(delimitedList(field))
doc = Literal('<Begin>') + fieldlist + Literal('**End**')

dstring = [
    '<Begin>"abc","de34","f_o_o"**End**',      # Good
    '<Begin><abc>,{{de34}},\'f_o_o\'**End**',  # Good
    '<Begin>"abc",\'de34","f_o_o\'**End**',    # Bad - mismatched quotes
    '<Begin>"abc","de34","f_o#o"**End**',      # Bad - invalid identifier
]

for ds in dstring:
    print(ds)
    try:
        print('  ', doc.parseString(ds))
    except ParseException as err:
        print(" "*(err.column-1) + "^")
        print(err)

这会产生所需的输出，接受两个好的测试字符串并拒绝两个不好的测试字符串：

<Begin>"abc","de34","f_o_o"**End**
   ['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin><abc>,{{de34}},'f_o_o'**End**
   ['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']
<Begin>"abc",'de34","f_o_o'**End**
            ^
Expected "**End**" (at char 12), (line:1, col:13)
<Begin>"abc","de34","f_o#o"**End**
                   ^
Expected "**End**" (at char 19), (line:1, col:20)

感谢Paul的帮助和制作这么酷的包。

Answer 2

我认为你对如何使用QuotedString有一些轻微的困惑。传递给QuotedString的参数是不引号内的字符串 - 它是要用作引号字符的字符。通过这种方式，您可以定义使用＆＃39; *＆＃39;作为报价，或＆＃39; =＆＃39;作为报价，或＆＃39;＆lt;＆＃39;和＆＃39;＆gt;＆＃39;打开和关闭报价字符。在您的示例中，只需将此定义用于字段：

field = QuotedString('"')

另外，不要害怕使用python的内置help（）方法来访问类，模块，方法等的文档字符串。

编辑：

QuotedString('X') 不解析"X"，它会解析X some characters inside matching characters X。

以下是您的完整（工作）示例程序：

from pyparsing import QuotedString, delimitedList, Group

dstring = '<Begin>"abc", "de34", "f_o_o"**End**'
field = QuotedString('"')
parser = "<Begin>" + Group(delimitedList(field)) + "**End**"

print(parser.parseString(dstring))

对我来说打印：

['<Begin>', ['abc', 'de34', 'f_o_o'], '**End**']

如果您在复制/粘贴此示例并运行该示例后遇到异常，请发布完整的异常。

更多例子：

starQuoteString = QuotedString('*')
eqQuoteString = QuotedString('=')
tildeQuoteString = QuotedString('~')
angleQuoteString = QuotedString('<', endQuoteChar='>')

fullSample = starQuoteString + eqQuoteString + tildeQuoteString + angleQuoteString

print fullSample.parseString("""
    *a string quoted with stars*
    =a very long quoted string, contained within equal signs=
    ~not a very long string at all~<another quoted string on the same line>
    """)

打印：

['a string quoted with stars', 'a very long quoted string, contained within equal signs', 'not a very long string at all', 'another quoted string on the same line']

您甚至不限于单个字符。您可以使用QuotedString('**')来解析结束**End**，但这也会接受**The End**，**Finis**或**That's all folks!**。

在pyparsing中使用QuotedString

2 个答案: