我正在寻找这些之间的任何东西; ' |'在我从网站上抓取的数据中。 我注意到了,' |'分离我感兴趣的所有东西。
["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"]
我想打印:
title=hello there!
subtitle=how are you
subsubtitle= I'm good, thanks
我认为我应该使用后视和前瞻,例如this,但当它介于' |'之间时。字符,然后它不起作用。
我猜它是这样的:
(?<=title=)(.*)(?=subtitle=)
(我对RegEx很新,但渴望学习!)
答案 0 :(得分:2)
如果您真的必须使用正则表达式,请不要使用不必要的lookbehind和lookahead过度复杂化它们。这些位是您尝试匹配的模式的一部分,只需使用它们:
title=(.*?)[|]subtitle=(.*?)[|]subsubtitle=(.*?)}
请注意,我还在您的前缀中包含了|
,否则|
字符将最终作为每个组的一部分。我将你贪婪的.*
群体变成了一个非贪婪的.*?
。如果您匹配所有组,那实际上并不是必需的 - 但在您的原始示例中,这就是标题最终包括sub
以及子字幕最终作为副标题的原因。最后,我将}
放在最后,这样你就不会将整个外部分组作为子字幕的一部分。
答案 1 :(得分:1)
您可以使用split()方法:
In [5]: data = "{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"[1:-1]
In [6]: data
Out[6]: "somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n"
In [7]: data = data.replace("\n", "")
In [8]: data
Out[8]: "somethingsomething|title=hello there!|subtitle=how are you|subsubtitle=I'm good, thanks"
In [9]: words = data.split("|")
In [10]: words
Out[10]:
['somethingsomething',
'title=hello there!',
'subtitle=how are you',
"subsubtitle=I'm good, thanks"]
In [11]: title = words[1].split("=")[1]
In [12]: title
Out[12]: 'hello there!'
In [13]: suttitle = words[2].split("=")[1]
In [14]: suttitle
Out[14]: 'how are you'
In [15]: subsuttitle = words[3].split("=")[1]
In [16]: subsuttitle
Out[16]: "I'm good, thanks"
答案 2 :(得分:1)
仅在处理复杂字符串时才需要正则表达式。像这样的简单字符串只能使用字符串函数来处理:
a = "[\"{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}\"]"
b = a.lstrip('["{')
c = b.rstrip('}"]')
c.split('|')
# ['somethingsomething',
# 'title=hello there!\n',
# 'subtitle=how are you\n',
# "subsubtitle=I'm good, thanks\n"]
答案 3 :(得分:0)
可能的解决方案:
regex = re.compile(r'\["\{([^}]+)\}"\]')
match = regex.match('["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]')
match.groups()[0].split('|')
-> ['somethingsomething', 'title=hello there!\n', 'subtitle=how are you\n', "subsubtitle=I'm good, thanks\n"]
您可能希望事后对字符串进行搜索。
答案 4 :(得分:0)
我认为你可以做到:
string = '["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]'
string = string[3:-3]
# crop the three first and last characters from the string
sentences = string.split('|')
title = sentences[1]
...
这将包含结果中的title=
答案 5 :(得分:0)
如果你想用正则表达式解决这个问题,那么一种方法如下。
s = ["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"]
match = re.search(r'title=(.*)\n', s[0])
if match:
print "title={0}".format(match.group(1))
match = re.search(r'subtitle=(.*)\n', s[0])
if match:
print "subtitle={0}".format(match.group(1))
match = re.search(r'subsubtitle=(.*)\n', s[0])
if match:
print "subsubtitle={0}".format(match.group(1))
答案 6 :(得分:0)
如果您想要使用lookahead
和lookbehind
进行正则表达式,可以尝试以下操作:
In [1]: import re
In [2]: s = "{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"
In [3]: m = re.findall(r"""(?<=\|)(?P<foo>.*?)(?:\=)(?P<bar>.*?(?=\n))""", s)
In [4]: for i,j in m:
...: print "{} = {}".format(i,j)
...:
title = hello there!
subtitle = how are you
subsubtitle = I'm good, thanks