Question

我正在使用Beautiful Soup来解析HTML表格。

Python版本3.2
美丽的汤版4.1.3

我在尝试使用findAll方法查找行中的列时遇到了问题。我收到一个错误，说列表对象没有属性findAll。我通过堆栈交换上的另一个帖子找到了这个方法，这不是问题。（BeautifulSoup HTML table parsing）

我意识到findAll是BeautifulSoup的一种方法，而不是python列表。奇怪的部分是当我在表列表中找到行时，findAll方法工作（我只需要页面上的第二个表），但是当我尝试在行列表中找到列时。

这是我的代码：

from urllib.request import URLopener
from bs4 import BeautifulSoup

opener = URLopener() #Open the URL Connection
page = opener.open("http://www.labormarketinfo.edd.ca.gov/majorer/countymajorer.asp?CountyCode=000001") #Open the page
soup = BeautifulSoup(page)

table = soup.findAll('table')[1] #Get the 2nd table (index 1)
rows = table.findAll('tr') #findAll works here
cols = rows.findAll('td') #findAll fails here
print(cols)

Answer 1

findAll()会返回结果列表，您需要循环遍历这些一个以获取另一个包含该元素的元素{{1方法：

findAll()

或选择一行行：

table = soup.findAll('table')[1]
rows = table.findAll('tr')
for row in rows:
    cols = rows.findAll('td')
    print(cols)

请注意，我们已弃用table = soup.findAll('table')[1] rows = table.findAll('tr') cols = rows[0].findAll('td') # columns of the *first* row. print(cols)，您应该使用findAll。

Python＆amp;美丽的汤 - 搜索结果字符串

1 个答案: