Question

我似乎做错了什么。我有一个HTML源代码，我使用urllib。基于这个HTML文件，我使用beautifulsoup来查找具有基于指定数组的ID的所有元素。这适用于我，但输出很乱，包括换行符“\ n”。

Python： 2.7.12
BeautifulSoup： bs4

我尝试使用prettify（）来纠正输出，但总是出错：

AttributeError：'ResultSet'对象没有属性'美化'

import urllib
import re
from bs4 import BeautifulSoup

cfile = open("test.txt")
clist = cfile.read()
clist = clist.split('\n')

i=0

while i<len (clist):
    url = "https://example.com/"+clist[i]
    htmlfile = urllib.urlopen (url)
    htmltext = htmlfile.read()

    soup = BeautifulSoup (htmltext, "html.parser")
    soup = soup.findAll (id=["id1", "id2", "id3"])

print soup.prettify()
i+=1

我确信这条线上有一些简单的东西：

soup = soup.findAll (id=["id1", "id2", "id3"])

我只是不确定是什么。对不起，如果这是一个愚蠢的问题。我已经使用Python和Beautiful Soup几天了。

Answer 1

您正在将soup变量重新分配给.findAll()的结果，这是一个ResultSet对象（基本上是一个标记列表），它没有prettify()方法

解决方案是让soup变量指向BeautifulSoup实例。

Answer 2

You can call prettify() on the top-level BeautifulSoup object, or on any of its Tag objects:

findAll返回匹配标记列表，因此您的代码等于[tag1,tag2..].prettify() 它不起作用。

python beautifulsoup无法美化

2 个答案: