Python从字符串中提取值并移到下一个

时间:2017-12-08 02:56:35

标签: python regex python-3.x

我试图提取符合模式的字符串' {" comments_disabled":'和'}},' 然后追加这两种模式之间的任何拟合。 (这些模式之间可能存在100多种匹配。

问题是下面的代码只是一直提取第一次出现,如何让它忽略之前附加到userpost列表的内容并移到下一个?

from bs4 import BeautifulSoup
page = urlopen("https://www.instagram.com/explore/tags/fun/")
soup = BeautifulSoup(page,"html.parser")
title = soup.title
script = str(soup.findAll('script', type="text/javascript"))

userpost = list()

for text in script:
userpost.append(script[script.find('{"comments_disabled":')/
:script.find('}},')+2])

1 个答案:

答案 0 :(得分:1)

尝试re.findall()

userpost = re.findall(r'{"comments disabled":(.*?)}},', script)

经测试的脚本:

import re

script = '''
{"comments disabled": one two }},
alpha beta
{"comments disabled": three four }},
{"comments disabled":
five six
}},
'''

userpost = re.findall(r'{"comments disabled":(.*?)}},', script, re.DOTALL)
print(userpost)

输出:

[' one two ', ' three four ', '\nfive six\n']