我的文字包含以下内容:
(some text)
libncursesw5-dev:amd64 depends on libc6-dev | libc-dev;(some text)
libx32ncursesw5 depends on libc6-x32 (>= 2.16);(some text)
libx32ncurses5-dev depends on libncurses5-dev (= 5.9+20150516-2ubuntu1);(some text)
libx32ncursesw5-dev depends on libc6-dev-x32;(some text)
lib32tinfo-dev depends on lib32c-dev;(some text)
以下是其中一句话的完整示例:
dpkg: error processing package lib32tinfo5 (--install):
dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of libncurses5-dev:amd64:
libncurses5-dev:amd64 depends on libc6-dev | libc-dev; however:
Package libc6-dev is not installed.
Package libc-dev is not installed.
整个文本分为几个段落,如上面的段落,每个段落都包含其中一个句子。
我想在python中使用re库的正则表达式使用findall选项给我这样的东西:
('libc6-dev', '', 'libc-dev', '')
('libc6-x32','2.16')
('libncurses5-dev','5.9+20150516-2ubuntu1')
('libc6-dev-x32','')
('lib32c-dev','')
换句话说,我希望得到你的帮助,以便从这样的文本中获取,如果指定了包含其版本的包的元组。
我做了这个正则表达式:
(?<=depends on )([a-zA-Z0-9\-]*)(?: \([=> ]*([a-zA-Z0-9-+.]*)(?:\)))?|(?: \| )([a-zA-Z0-9\-]*)(?: \([=> ]*([a-zA-Z0-9-+.]*)(?:\)))?(?=;)
我得到了这个结果:
('libc6-dev', '', '', '')
('', '', 'libc-dev', '')
('libc6-x32', '2.16', '', '')
('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
('libc6-dev-x32', '', '', '')
('lib32c-dev', '', '', '')
如你所见,对于这句话:
libncursesw5-dev:amd64 depends on libc6-dev | libc-dev;
我得到了这个答案:
('libc6-dev', '', '', '')
('', '', 'libc-dev', '')
而不是这一个:
('libc6-dev', '', 'libc-dev', '')
感谢您的帮助。
答案 0 :(得分:1)
#!/usr/bin/python2
# -*- coding: utf-8 -*-
import re
input = """(some text)
libncursesw5-dev:amd64 depends on libc6-dev | libc-dev;(some text)
libx32ncursesw5 depends on libc6-x32 (>= 2.16);(some text)
libx32ncurses5-dev depends on libncurses5-dev (= 5.9+20150516-2ubuntu1);(some text)
libx32ncursesw5-dev depends on libc6-dev-x32;(some text)
lib32tinfo-dev depends on lib32c-dev;(some text)"""
#a = []
#m = re.findall("depends on ([^\s;]+)\ \|\ ([^\s;]+)", input) # 1
#a = a + m
#m = re.findall("depends on ([^\s;]+)\ \([><=]{,2} ([^;]+)\)", input) # 2, 3
#a = a + m
#m = re.findall("depends on ([^\s;]+)", input) # 4, 5
#a = a + m
m = re.findall("depends on ([^\s;]+)\ \|\ ([^\s;]+)|depends on ([^\s;]+)\ \([><=]{,2} ([^;]+)\)|depends on ([^\s;]+)", input)
print m
输出:
[
('libc6-dev', 'libc-dev', '', '', ''),
('', '', 'libc6-x32', '2.16', ''),
('', '', 'libncurses5-dev', '5.9+20150516-2ubuntu1', ''),
('', '', '', '', 'libc6-dev-x32'),
('', '', '', '', 'lib32c-dev')
]
你可以一个接一个地或全部一起得到它我不知道这是否可以帮到你