在PyParsing中,如何忽略可能以空格开头的行?

时间:2017-05-25 10:48:13

标签: python pyparsing

我正在尝试解析类似于以下文件的数据(我将其命名为foo_badging.txt):

package: name='com.sec.android.app.camera.shootingmode.dual' versionCode='6' versionName='1.003' platformBuildVersionName='5.0.1-1624448'
sdkVersion:'17'
uses-permission: name='android.permission.CAMERA'
application-icon-640:'res/mipmap-xxhdpi-v4/application_manager_camera_mode_ic_dual_camera.png'
application: label='Dual camera' icon='res/mipmap-hdpi-v4/application_manager_camera_mode_ic_dual_camera.png'
feature-group: label=''
  uses-feature: name='android.hardware.camera'
  uses-implied-feature: name='android.hardware.camera' reason='requested android.permission.CAMERA permission'
  uses-feature: name='android.hardware.touchscreen'
  uses-implied-feature: name='android.hardware.touchscreen' reason='default feature for all apps'
other-activities
supports-screens: 'small' 'normal' 'large' 'xlarge'
supports-any-density: 'true'
locales: '--_--' 'ca' 'da' 'fa' 'ga' 'ja' 'pa' 'nb' 'be' 'de' 'ne' 'bg' 'mg' 'tg' 'th' 'xh' 'fi' 'hi' 'si' 'vi' 'sk' 'tk' 'uk' 'el' 'nl' 'pl' 'sl' 'tl' 'bn' 'in' 'ko' 'ro' 'sq' 'ar' 'fr' 'hr' 'or' 'sr' 'tr' 'as' 'cs' 'it' 'lt' 'gu' 'hu' 'ru' 'zu' 'lv' 'sv' 'iw' 'fr-CA' 'lo-LA' 'bn-BD' 'et-EE' 'ka-GE' 'ky-KG' 'my-ZG' 'km-KH' 'en-PH' 'zh-HK' 'mk-MK' 'ur-PK' 'hy-AM' 'my-MM' 'zh-CN' 'ta-IN' 'te-IN' 'ml-IN' 'bn-IN' 'kn-IN' 'mr-IN' 'mn-MN' 'pl-SP' 'pt-BR' 'gl-ES' 'es-ES' 'eu-ES' 'is-IS' 'en-US' 'es-US' 'pt-PT' 'zh-TW' 'ms-MY' 'az-AZ' 'kk-KZ' 'uz-UZ'
densities: '160' '240' '320' '480' '640'

我首先要解析前几行(packagesdkVersion),然后'跳过'几行,直到我到达supports-screens行。以下是我到目前为止的情况:

from pyparsing import Literal, QuotedString, LineEnd, Optional, OneOrMore, LineStart, Regex, White

with open('foo_badging.txt') as fp:
    badging = fp.read()

package_name = "name=" + QuotedString(quoteChar="'")("name")
versionCode = "versionCode=" + QuotedString(quoteChar="'")("versionCode")
versionName = "versionName=" + QuotedString(quoteChar="'")("versionName")
platformBuildVersionName = "platformBuildVersionName=" + QuotedString(quoteChar="'")("platformBuildVersionName")
sdkVersion = "sdkVersion:" + QuotedString(quoteChar="'")("sdkVersion")
targetSdkVersion = "targetSdkVersion:" + QuotedString(quoteChar="'")("targetSdkVersion")

not_supports_screens_line = LineStart() + Regex(r"(?!supports-screens:).*")     # Negative lookahead assertion for a line starting with "supports-screens:"

supports_screens = "supports-screens:" + QuotedString(quoteChar="'")("supports_screens")

expression = Literal("package:") + package_name + versionCode + versionName + platformBuildVersionName + LineEnd() \
                + Optional(sdkVersion + LineEnd()) \
                + Optional(targetSdkVersion + LineEnd()) \
                + OneOrMore(not_supports_screens_line) \
                + supports_screens + LineEnd()

tokens = expression.parseString(badging)

问题是我在缩进的ParseException行获得了use-feature

Traceback (most recent call last):
  File "/home/kurt/Documents/Scratch/apk_checker/apk_check.py", line 82, in <module>
    tokens = expression.parseString(badging)
  File "/usr/local/lib/python2.7/dist-packages/pyparsing.py", line 1632, in parseString
    raise exc
pyparsing.ParseException: Expected "supports-screens:" (at char 435), (line:7, col:3)

显然这个缩进的行不算作not_supports_screens_line,大概是因为与其他行不同,它以两个空格开头。我已尝试将Regex修改为

not_supports_screens_line = LineStart() + Regex(r"\s*(?!supports-screens:).*")

使用\s*,以及

not_supports_screens_line = LineStart() + Optional(White()) + Regex(r"(?!supports-screens:).*")

但在这两种情况下我仍然会收到相同的错误消息。如何使not_supports_screens_line也匹配这些缩进的行?

1 个答案:

答案 0 :(得分:0)

Paul McGuire的评论之后,我使用SkipTo来避免为我不感兴趣的行制定复杂的负前瞻表达式。这是生成的代码:

def convert_to_int(tokens):
    return int(tokens[0])

with open('foo_badging.txt') as fp:
    badging = fp.read()

package_name = "name=" + QuotedString(quoteChar="'")("name")
versionCode = "versionCode=" + QuotedString(quoteChar="'")("versionCode").setParseAction(convert_to_int)
versionName = "versionName=" + QuotedString(quoteChar="'")("versionName")
platformBuildVersionName = "platformBuildVersionName=" + QuotedString(quoteChar="'")("platformBuildVersionName")
sdkVersion = "sdkVersion:" + QuotedString(quoteChar="'")("sdkVersion").setParseAction(convert_to_int)
targetSdkVersion = "targetSdkVersion:" + QuotedString(quoteChar="'")("targetSdkVersion").setParseAction(convert_to_int)

supports_screens = LineStart() + "supports-screens:" + QuotedString(quoteChar="'")("supports_screens")

expression = Literal("package:") + package_name + versionCode + versionName + platformBuildVersionName + LineEnd() \
                + Optional(sdkVersion + LineEnd()) \
                + Optional(targetSdkVersion + LineEnd()) \
                + SkipTo("supports-screens:") + supports_screens

tokens = expression.parseString(badging)

print tokens.asDict()

打印

{'sdkVersion': 17, 'name': 'com.sec.android.app.camera.shootingmode.dual', 'platformBuildVersionName': '5.0.1-1624448', 'supports_screens': 'small', 'versionName': '1.003', 'versionCode': 6}

根据需要包含supports_screens字段。