Question

I have participant survey data that contains, for each variable: the variable name, its value in that observation, and the conditions required for that question to have been asked (if earlier answers establish that a question is not applicable, the participant won't be prompted). One of my tasks is to differentiate the blanks that mean N/A from the blanks that represent an asked-but-unanswered prompt. Unfortunately, the export function in our data capture tool doesn't offer this feature.

To get around this, I'm comparing the conditions for each variable's branching to the recorded observation and seeing whether the prompt should have displayed. That is probably confusing, so as an example imagine a record for Subject A:

Variable Name | Observation Value | Branching Logic
foo           |         5         | 
bar           |         2         | foo != 2
baz           |         7         | foo < 10 or bar == 5

The prompt for foo shows up no matter what; the prompt for bar will show because foo = 5 satisfies its condition foo != 2, and similarly baz will be observed. I'm treating it as a pandas dataframe, so I'm using a dict to represent the test data while I build a toy version of the module. I almost have it working but am missing one piece: nested parentheses.

There are a lot of similar questions (e.g. pyparsing and line breaks) and I found a very similar example in the PyParsing documentation that handles logical notation, but I'm not great at python and had trouble following the use of multiple classes, child classes, etc. I was able to use that as a jumping off point for the following:

import pyparsing as pp
test_data = {
    'a' : 3,
    'b' : 6,
    'c' : 2,
    'd' : 4 
    }

# Functions applied by parser
def toInt(x):
    return [int(k) for k in x]
def useKey(x):
    try: return [test_data[k] for k in x]
    except KeyError: print("Value not a key:", x)
def checkCond(parsed):
    allinone = parsed[0]
    print("Condition:", allinone)
    humpty = " ".join([str(x) for x in allinone])
    return eval(humpty)

# Building the parser
key = pp.Word(pp.alphanums + '_')('key')
op = pp.oneOf('> >= == != <= <')('op')
val = pp.Word(pp.nums + '-')('value')
joint = pp.oneOf("and or")
key.setParseAction(useKey)
val.setParseAction(toInt)
cond = pp.Group(key + op + val)('condition')
cond.addParseAction(checkCond)
logic = cond + pp.Optional(joint) + pp.Optional(cond)

# Tests
if __name__ == "__main__":
    tests = [
        ("a == 5", False),
        ("b < 3", False),
        ("c > 1", True),
        ("d != 2", True),
        ("a >= 1", True),
        ("b <= 5", False),
        ("a <= 6 and b == 2", False),
        ("a <= 6 or b == 2", True)]
        #("b > 2 and (a == 3 or d > 2 or c < 1)", True)]
    for expr, res in tests:
        print(expr)
        out = logic.parseString(expr)
        out = " ".join([str(x) for x in out])
        out = bool(eval(out))
        if bool(out) == bool(res):
            print("PASS\n")
        else: print("FAIL\n", 
            "Got:", bool(out), 
            "\nExpected:",bool(res), "\n")

After a lot of trial and error I'm getting the results I expected out of this. Notice that the last test is commented out, though; if you uncomment that and run it, you get:

b > 2 and (a == 3 or d > 2 or c < 1)
Condition: [6, '>', 2]
Traceback (most recent call last):
  File "testdat/pptutorial.py", line 191, in <module>
    out = bool(eval(out))
  File "<string>", line 1
    True and
           ^
SyntaxError: unexpected EOF while parsing

I'm sure it's something very silly that I'm missing but for the life of me I cannot figure this piece out. It seems like the parentheses make the parser think it's the start of a new statement. There are other answers that suggest looking for empty values, printing out the individual tokens, etc., but I haven't had any luck that way. My guess is it's something with how I've set up the groups in the parser. I've never built one before so this is definitely uncharted territory for me! Thanks a ton for any help and let me know if there's more information I can provide.

Answer 1

语法的任何部分都不允许在输入中使用括号，这就是为什么pyparsing在遇到括号时会停止解析的原因。

您可以根据您对logic的定义稍加调整的条件允许使用括号：

cond_chain_with_parentheses = pp.Forward()
cond_chain = cond + pp.Optional(joint + cond_chain_with_parentheses)
cond_chain_with_parentheses <<= cond_chain | '(' + cond_chain + ')'

logic = cond_chain_with_parentheses + pp.StringEnd()

在这里，我使用了cond_chain_with_parentheses >>> logic.parseString("b > 2 and (a == 3 or d > 2 or c < 1)") Condition: [6, '>', 2] Condition: [3, '==', 3] Condition: [4, '>', 2] Condition: [2, '<', 1] ([True, 'and', '(', True, 'or', True, 'or', False, ')'], {'condition': [True, True, True, False]})，这允许我在语法定义中使用它，即使它尚未定义。我还添加了forward declaration，以便在不能解析整个输入的情况下抛出异常。

此语法可以正确解析所有输入：

productFlavors

PyParsing and nested parens: unexpected EOF error

1 个答案: