如何解析此列表中的元素?

时间:2019-01-01 12:47:32

标签: python regex

我有一个要解析的列表(但是我正在寻找一种通用的方法来解析任何这样的列表):

dev-libs/icu-63.1-r1 alpha amd64 arm arm64 ia64 ppc ppc64 x86 hppa s390 dev-libs/icu-layoutex-63.1 alpha amd64 ia64 ppc ppc64 x86 hppa sparc dev-lang/perl-5.28-r1 s390 virtual/ruby_gems-0.3_pre24 amd64 x86

这似乎有时会下降,因为它尝试解析体系结构列表,例如以alpha开头直到行尾,但是我真的想忽略软件包版本之后的所有内容,但在之后保留空格的可能性一个版本。

我的代码如下:(仅打印调试内容)

for line in args.list:
    print(line)
    package_category = re.search(r'((?<==)\w+-\w+|\w+-\w+|\w+)', line).group(0)
    print(package_category)
    package_name = re.search(r'(?<=/)[a-z]+.[a-z]+', line).group(0)
    print(package_name)
    package_version = re.search(r'(?<=-)\d+.\d-*\w*\s?', line).group(0)

我希望这样做:

package_category变量应包含如下类别:

dev-libs dev-lang virtual

package_name应该包含一个软件包名称,例如:

icu icu-layoutex perl ruby_gems

package_version:

63.1-r1 63.1 0.3_pre24

其余的应该忽略不计

目前,我突然以某种方式进入了建筑列表:

dev-libs/icu-63.1-r1 dev-libs icu alpha alpha Traceback (most recent call last): File "./repomator.py", line 47, in <module> package_name = re.search(r'(?<=/)[a-z]+.[a-z]+', line).group(0) AttributeError: 'NoneType' object has no attribute 'group'

1 个答案:

答案 0 :(得分:1)

是您想要的吗?

npm install

Demo

说明:

package-lock.json

代码:

(?P<category>\w+(?:-\w+)?)/(?P<name>[a-z]+(?:[-_][a-z]+)?)-(?P<version>\S+)

输出:

(?<category>            # named group category
  \w+                   # 1 or more word character
  (?:-\w+)?             # optional, a dash then 1 or more word character
)                       # end group
/                       # a slash
(?<name>                # named group name
  [a-z]+                # 1 or more alpha
  (?:[-_][a-z]+)?       # optional, dash or underscore and 1 or more alpha
)                       # end group
-                       # a dash
(?<version>             # named group version
  \S+                   # 1 or more non space character
)                       # end group