Question

我试图提取符合模式的字符串＆＃39; {＆＃34; comments_disabled＆＃34;：＆＃39;和＆＃39;}}，＆＃39; 然后追加这两种模式之间的任何拟合。（这些模式之间可能存在100多种匹配。

问题是下面的代码只是一直提取第一次出现，如何让它忽略之前附加到userpost列表的内容并移到下一个？

from bs4 import BeautifulSoup
page = urlopen("https://www.instagram.com/explore/tags/fun/")
soup = BeautifulSoup(page,"html.parser")
title = soup.title
script = str(soup.findAll('script', type="text/javascript"))

userpost = list()

for text in script:
userpost.append(script[script.find('{"comments_disabled":')/
:script.find('}},')+2])

Answer 1

尝试re.findall()：

userpost = re.findall(r'{"comments disabled":(.*?)}},', script)

经测试的脚本：

import re

script = '''
{"comments disabled": one two }},
alpha beta
{"comments disabled": three four }},
{"comments disabled":
five six
}},
'''

userpost = re.findall(r'{"comments disabled":(.*?)}},', script, re.DOTALL)
print(userpost)

输出：

[' one two ', ' three four ', '\nfive six\n']

Python从字符串中提取值并移到下一个

1 个答案: