如何编写正则表达式以在Python中查找多个字符串

时间:2016-03-14 17:44:55

标签: python regex

例如,我有一个像

这样的字符串
"look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "

“look [+3]”表示该句话涉及某个项目的某个方面,而[+3]表示它是一个评分为3的正面评论。(这实际上来自亚马逊评论数据集。)

我想把它拆分为

X: "it 's very sleek looking with a very good front panel button layout , and it has a great feature set ."

Y: [("look", 3), ("panel button layout", 3), ("feature", 2)]

3 个答案:

答案 0 :(得分:3)

一种选择是在字符串或逗号开头之后捕获所有内容,直到[并在[+之后提取数字:

>>> import re
>>> s = "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)
[('look', '3'), ('panel button layout', '3'), ('feature', '2')]
>>>
>>> s = "darn diopter adjustment dial[-1]"
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)                                                            
[('darn diopter adjustment dial', '-1')]

其中:

  • (?:^|,)是一个非捕获组,可以匹配字符串的开头或逗号
  • (.*?)是任意次数的非贪婪匹配
  • \[\+?(\-?\d+)\]会匹配一个开头[,后跟一个可选的+,后跟一个捕获组,该捕获组将捕获一个或多个数字(开头有一个可选的- ),然后是结束]

答案 1 :(得分:0)

您可以使用re.findall('(.*\[\+\d+\],?)', s)获取所需的Y输出。

答案 2 :(得分:-1)

try this regular expression

([^\]]+[[^\]])+(.*)

你的key / val对是1美元,摘要是2美元。

编辑:虽然re不支持每组多个匹配(只有最后一次捕获可用),the new regex module does

>>> m = regex.search(r"([^\]]+[[^\]])+(.*)", "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . ")
>>> m.group(1)
',feature[+2]'
>>> m.captures(1)
['look[+3]', ',panel button layout[+3]', ',feature[+2]']
>>> m.group(2)
"it's very sleek looking with a very good front panel button layout , and it has a great feature set . "