Question

我正在寻找这些之间的任何东西; ＆＃39; |＆＃39;在我从网站上抓取的数据中。我注意到了，＆＃39; |＆＃39;分离我感兴趣的所有东西。

["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"]

我想打印：

title=hello there!
subtitle=how are you
subsubtitle= I'm good, thanks

我认为我应该使用后视和前瞻，例如this，但当它介于＆＃39; |＆＃39;之间时。字符，然后它不起作用。

我猜它是这样的：

(?<=title=)(.*)(?=subtitle=)

（我对RegEx很新，但渴望学习！）

Answer 1

如果您真的必须使用正则表达式，请不要使用不必要的lookbehind和lookahead过度复杂化它们。这些位是您尝试匹配的模式的一部分，只需使用它们：

title=(.*?)[|]subtitle=(.*?)[|]subsubtitle=(.*?)}

Regular expression visualization

Debuggex Demo

请注意，我还在您的前缀中包含了|，否则|字符将最终作为每个组的一部分。我将你贪婪的.*群体变成了一个非贪婪的.*?。如果您匹配所有组，那实际上并不是必需的 - 但在您的原始示例中，这就是标题最终包括sub以及子字幕最终作为副标题的原因。最后，我将}放在最后，这样你就不会将整个外部分组作为子字幕的一部分。

Answer 2

您可以使用split()方法：

In [5]: data = "{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"[1:-1]
In [6]: data
Out[6]: "somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n"
In [7]: data = data.replace("\n", "")
In [8]: data
Out[8]: "somethingsomething|title=hello there!|subtitle=how are you|subsubtitle=I'm good, thanks"
In [9]: words = data.split("|")
In [10]: words
Out[10]: 
['somethingsomething',
 'title=hello there!',
 'subtitle=how are you',
 "subsubtitle=I'm good, thanks"]
In [11]: title = words[1].split("=")[1]
In [12]: title
Out[12]: 'hello there!'
In [13]: suttitle =  words[2].split("=")[1]
In [14]: suttitle
Out[14]: 'how are you'
In [15]: subsuttitle = words[3].split("=")[1]
In [16]: subsuttitle
Out[16]: "I'm good, thanks"

Answer 3

仅在处理复杂字符串时才需要正则表达式。像这样的简单字符串只能使用字符串函数来处理：

a = "[\"{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}\"]"
b = a.lstrip('["{')
c = b.rstrip('}"]')
c.split('|')
# ['somethingsomething',
# 'title=hello there!\n',
# 'subtitle=how are you\n',
# "subsubtitle=I'm good, thanks\n"]

Answer 4

可能的解决方案：

regex = re.compile(r'\["\{([^}]+)\}"\]')
match = regex.match('["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]')
match.groups()[0].split('|')

-> ['somethingsomething', 'title=hello there!\n', 'subtitle=how are you\n', "subsubtitle=I'm good, thanks\n"]

您可能希望事后对字符串进行搜索。

Answer 5

我认为你可以做到：

string = '["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]'
string = string[3:-3]
# crop the three first and last characters from the string
sentences = string.split('|')
title = sentences[1]
...

这将包含结果中的title=

Answer 6

如果你想用正则表达式解决这个问题，那么一种方法如下。

s = ["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"]

match = re.search(r'title=(.*)\n', s[0])
if match:
    print "title={0}".format(match.group(1))

match = re.search(r'subtitle=(.*)\n', s[0])
if match:
    print "subtitle={0}".format(match.group(1))

match = re.search(r'subsubtitle=(.*)\n', s[0])
if match:
    print "subsubtitle={0}".format(match.group(1))

Answer 7

如果您想要使用lookahead和lookbehind进行正则表达式，可以尝试以下操作：

In [1]: import re

In [2]: s = "{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"

In [3]: m = re.findall(r"""(?<=\|)(?P<foo>.*?)(?:\=)(?P<bar>.*?(?=\n))""", s)

In [4]: for i,j in m:
   ...:     print "{} = {}".format(i,j)
   ...:     
title = hello there!
subtitle = how are you
subsubtitle = I'm good, thanks

在正则表达式中查找＆＃34; |＆＃34; s之间的句子

7 个答案: