在正则表达式中查找" |" s之间的句子

时间:2015-04-27 10:16:52

标签: python regex

我正在寻找这些之间的任何东西; ' |'在我从网站上抓取的数据中。 我注意到了,' |'分离我感兴趣的所有东西。

["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"]

我想打印:

title=hello there!
subtitle=how are you
subsubtitle= I'm good, thanks

我认为我应该使用后视和前瞻,例如this,但当它介于' |'之间时。字符,然后它不起作用。

我猜它是这样的:

(?<=title=)(.*)(?=subtitle=)

(我对RegEx很新,但渴望学习!)

7 个答案:

答案 0 :(得分:2)

如果您真的必须使用正则表达式,请不要使用不必要的lookbehind和lookahead过度复杂化它们。这些位是您尝试匹配的模式的一部分,只需使用它们:

title=(.*?)[|]subtitle=(.*?)[|]subsubtitle=(.*?)}

Regular expression visualization

Debuggex Demo

请注意,我还在您的前缀中包含了|,否则|字符将最终作为每个组的一部分。我将你贪婪的.*群体变成了一个非贪婪的.*?。如果您匹配所有组,那实际上并不是必需的 - 但在您的原始示例中,这就是标题最终包括sub以及子字幕最终作为副标题的原因。最后,我将}放在最后,这样你就不会将整个外部分组作为子字幕的一部分。

答案 1 :(得分:1)

您可以使用split()方法:

In [5]: data = "{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"[1:-1]
In [6]: data
Out[6]: "somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n"
In [7]: data = data.replace("\n", "")
In [8]: data
Out[8]: "somethingsomething|title=hello there!|subtitle=how are you|subsubtitle=I'm good, thanks"
In [9]: words = data.split("|")
In [10]: words
Out[10]: 
['somethingsomething',
 'title=hello there!',
 'subtitle=how are you',
 "subsubtitle=I'm good, thanks"]
In [11]: title = words[1].split("=")[1]
In [12]: title
Out[12]: 'hello there!'
In [13]: suttitle =  words[2].split("=")[1]
In [14]: suttitle
Out[14]: 'how are you'
In [15]: subsuttitle = words[3].split("=")[1]
In [16]: subsuttitle
Out[16]: "I'm good, thanks"

答案 2 :(得分:1)

仅在处理复杂字符串时才需要正则表达式。像这样的简单字符串只能使用字符串函数来处理:

a = "[\"{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}\"]"
b = a.lstrip('["{')
c = b.rstrip('}"]')
c.split('|')
# ['somethingsomething',
# 'title=hello there!\n',
# 'subtitle=how are you\n',
# "subsubtitle=I'm good, thanks\n"]

答案 3 :(得分:0)

可能的解决方案:

regex = re.compile(r'\["\{([^}]+)\}"\]')
match = regex.match('["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]')
match.groups()[0].split('|')

-> ['somethingsomething', 'title=hello there!\n', 'subtitle=how are you\n', "subsubtitle=I'm good, thanks\n"]

您可能希望事后对字符串进行搜索。

答案 4 :(得分:0)

我认为你可以做到:

string = '["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]'
string = string[3:-3]
# crop the three first and last characters from the string
sentences = string.split('|')
title = sentences[1]
...

这将包含结果中的title=

答案 5 :(得分:0)

如果你想用正则表达式解决这个问题,那么一种方法如下。

s = ["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"]

match = re.search(r'title=(.*)\n', s[0])
if match:
    print "title={0}".format(match.group(1))

match = re.search(r'subtitle=(.*)\n', s[0])
if match:
    print "subtitle={0}".format(match.group(1))

match = re.search(r'subsubtitle=(.*)\n', s[0])
if match:
    print "subsubtitle={0}".format(match.group(1))

答案 6 :(得分:0)

如果您想要使用lookaheadlookbehind进行正则表达式,可以尝试以下操作:

In [1]: import re

In [2]: s = "{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"

In [3]: m = re.findall(r"""(?<=\|)(?P<foo>.*?)(?:\=)(?P<bar>.*?(?=\n))""", s)

In [4]: for i,j in m:
   ...:     print "{} = {}".format(i,j)
   ...:     
title = hello there!
subtitle = how are you
subsubtitle = I'm good, thanks