Question

我正在分析来自保存为本地文件的网站中的数据。我可以解析一些文本而没有问题，但是，下一个问题是我遇到的困难。我要解析的html被注释掉，因此我将数据保存到本地文件中并转换为html。我可以导航到tbody，但无法获取每个tr。 for循环似乎停留在第一次迭代中。

import requests
from bs4 import BeautifulSoup
from bs4 import Comment
from csv import writer


response = requests.get('https://www.pro-football-reference.com/teams/buf/2016_roster.htm')
soup = BeautifulSoup(response.text, 'html.parser')


comments=soup.findAll(string=lambda text:isinstance(text,Comment))
body=comments[18]
file=open('file.html.','w')
file.write('<html>')
file.write(body)
file.write('</html>')
file.close()



soup_local = BeautifulSoup(open('file.html'), 'html.parser')
test = soup_local.tbody
    for item in test:
        Number=test.th.get_text()
        print(Number)

当我期望〜60个不同的数字时，这将返回100+的相同数字。

Answer 1

您想要找到元素“ tbody”，然后找到其中的所有“ th”元素。将最后5行更改为：

soup_local = BeautifulSoup(open('file.html'), 'html.parser')
test = soup_local.find('tbody')
for item in test.find_all('th'):
    Number=item.get_text()
    print(Number)

输出：

Python BeautifulSoup无法解析每个项目

1 个答案: