Question

当我尝试使用多个index range error时，我得到findall，但如果我只使用一个，那么代码就可以了。

from re import findall
news = open('download7.html', 'r')

title = findall('<item>[^<]+<title>(.*)</title>', news.read())
link = findall('<item>[^<]+<title>[^<]+</title>[^<]+<link>(.*)</link>', news.read())
description = findall('<!\[CDATA\[[^<]+<p>(.*)</p>', news.read())
pubdate = findall('<pubDate>([^<]+)</pubDate>', news.read())
image_regex = findall('url="([^"]+627.jpg)', news.read())
print(image_regex[0])

Answer 1

在文件对象上调用.read()从文件中读取所有剩余数据，并将文件指针保留在文件末尾（以便后续调用.read()返回空字符串）。

缓存文件内容一次，然后重复使用它：

from re import findall

with open('download7.html', 'r') as news:
    newsdata = news.read()

title = findall('<item>[^<]+<title>(.*)</title>', newsdata)
link = findall('<item>[^<]+<title>[^<]+</title>[^<]+<link>(.*)</link>', newsdata)
description = findall('<!\[CDATA\[[^<]+<p>(.*)</p>', newsdata)
pubdate = findall('<pubDate>([^<]+)</pubDate>', newsdata)
image_regex = findall('url="([^"]+627.jpg)', newsdata)
print(image_regex[0])

注意：您可以通过在每次读取（回调news.seek(0)）后回到开头来重新读取文件对象，但是当您需要反复使用完整的文件数据时效率要低得多。

不能使用findall的多个实例

1 个答案: