Question

我正在尝试解析以下内容：

<delimiter><text><delimiter><text><delimter>

其中delimiter可以是重复三次的任何单个文字字符，text可以是分隔符旁边的任何可打印字符（text的第一次和第二次出现不必须匹配并且可以为空白。

这是我提出的，但是text从第一个分隔符消耗到字符串的结尾。

from pyparsing import Word, printables

delimiter = Word(printables, exact=1)
text = (Word(printables) + ~delimiter)

parser = delimiter + text  # + delimiter + text + delimiter

tests = [
    ('_abc_123_', ['_', 'abc', '_', '123', '_']),
    ('-abc-123-', ['-', 'abc', '-', '123', '-']),
    ('___', ['_', '', '_', '', '_']),
]

for test, expected in tests:
    print parser.parseString(test), '<=>', expected

脚本输出：

['_', 'abc_123_'] <=> ['_', 'abc', '_', '123', '_']
['-', 'abc-123-'] <=> ['-', 'abc', '-', '123', '-']
['_', '__'] <=> ['_', '', '_', '', '_']

我想我需要使用Future，但我可以从文本标记中解析分析时的分隔符值。

Answer 1

你的直觉是正确的，你需要使用public class Counter extends BaseOperation implements Function { ... @Override public void operate(FlowProcess flowProcess, FunctionCall functionCall) { functionCall.getOutputCollector().add(functionCall.getArguments()); flowProcess.increment(counterGroup, counterName, 1); } ... }（不是Forward）来捕获文本的定义，因为直到解析时才能完全知道。此外，您使用Word必须使用Future参数排除分隔符字符 - 仅使用excludeChars是不够的。

这是您的代码，标记了必要的更改，希望有一些有用的评论：

Word(printables) + ~delimiter

解析时的参考标记值

1 个答案: