在下面的代码中,符号字符串re.sub('<[^>]*>|[\n]|\[[0-9]*\]', '', htmlread)
的每个元素是什么意思?
import urllib2
import re
htmltext = urllib2.urlopen("https://en.wikipedia.org/wiki/Linkin_Park")
htmlread = htmltext.read()
htmlread = re.sub('<[^>]*>|[\n]|\[[0-9]*\]', '', htmlread)
regex = '(?<=Linkin Park was founded)(.*)(?=the following year.)'
pattern = re.compile(regex)
htmlread = re.findall(pattern, htmlread)
print "Linkin Park was founded" + htmlread[0] + "the following year."
答案 0 :(得分:0)
第htmlread = re.sub('<[^>]*>|[\n]|\[[0-9]*\]', '', htmlread)
行删除了
<>
OR 来自htmlread
有趣的wiki帖子:Reference - What does this regex mean?
答案 1 :(得分:0)
用&#39;&#39;替换每个字符,这意味着将其从htmlread变量
中删除请阅读有关RegEx的更多信息