例如,我有一个像
这样的字符串"look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "
“look [+3]”表示该句话涉及某个项目的某个方面,而[+3]表示它是一个评分为3的正面评论。(这实际上来自亚马逊评论数据集。)
我想把它拆分为
X: "it 's very sleek looking with a very good front panel button layout , and it has a great feature set ."
Y: [("look", 3), ("panel button layout", 3), ("feature", 2)]
答案 0 :(得分:3)
一种选择是在字符串或逗号开头之后捕获所有内容,直到[
并在[+
之后提取数字:
>>> import re
>>> s = "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . "
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)
[('look', '3'), ('panel button layout', '3'), ('feature', '2')]
>>>
>>> s = "darn diopter adjustment dial[-1]"
>>> re.findall(r"(?:^|,)(.*?)\[\+?(\-?\d+)\]", s)
[('darn diopter adjustment dial', '-1')]
其中:
(?:^|,)
是一个非捕获组,可以匹配字符串的开头或逗号(.*?)
是任意次数的非贪婪匹配\[\+?(\-?\d+)\]
会匹配一个开头[
,后跟一个可选的+
,后跟一个捕获组,该捕获组将捕获一个或多个数字(开头有一个可选的-
),然后是结束]
答案 1 :(得分:0)
您可以使用re.findall('(.*\[\+\d+\],?)', s)
获取所需的Y
输出。
答案 2 :(得分:-1)
([^\]]+[[^\]])+(.*)
你的key / val对是1美元,摘要是2美元。
编辑:虽然re
不支持每组多个匹配(只有最后一次捕获可用),the new regex
module does:
>>> m = regex.search(r"([^\]]+[[^\]])+(.*)", "look[+3],panel button layout[+3],feature[+2]it 's very sleek looking with a very good front panel button layout , and it has a great feature set . ")
>>> m.group(1)
',feature[+2]'
>>> m.captures(1)
['look[+3]', ',panel button layout[+3]', ',feature[+2]']
>>> m.group(2)
"it's very sleek looking with a very good front panel button layout , and it has a great feature set . "