我试图在给定的HTML中找到下面的特定表:
<table class="sidearm-table collapse-on-medium accordion" accordion-table="" sortable-table="">
<caption>Tennessee Tech<span class="hide"> - Pitching Stats</span></caption>
我的方法是找到标题,然后继续查找父表,我将从中迭代这些行以找到我想要的文本(我可以自己完成这部分)。我相信我的错误隐藏在标题文本继续进入span标记的事实中,但不确定是否是这种情况。下面给出了我这样做的代码,但是它继续返回None,因为它找不到表(我的语法可能不正确):
from bs4 import BeautifulSoup
import re
import requests
header = {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'}
redirect = requests.get('http://goblueraiders.com/boxscore.aspx?path=baseball&id=6117', headers = header).text
soup = BeautifulSoup(redirect, 'html.parser')
test = soup.find('caption', text = 'Tennessee Tech').find_parent('table', {'class': 'sidearm-table collapse-on-medium accordion'})
答案 0 :(得分:2)
我会尝试查找所有字幕,然后匹配标题文字,如下所示:
from bs4 import BeautifulSoup
import re
import requests
header = {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'}
redirect = requests.get('http://goblueraiders.com/boxscore.aspx?path=baseball&id=6117', headers = header).text
soup = BeautifulSoup(redirect, 'html.parser')
for caption in soup.find_all('caption'):
if caption.get_text() == 'Tennessee Tech - Pitching Stats':
table = caption.find_parent('table', {'class': 'sidearm-table collapse-on-medium accordion'})
答案 1 :(得分:0)
执行:
from bs4 import BeautifulSoup
html = """
<table class="sidearm-table collapse-on-medium accordion" accordion-table="" sortable-table="">
<caption>
Tennessee Tech
<span class="hide"> - Pitching Stats</span>
</caption>
</table>
"""
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table', {'class': 'sidearm-table'})
print(table.contents)
输出:
['\n', <caption>
Tennessee Tech
<span class="hide"> - Pitching Stats</span>
</caption>, '\n']
但找不到您的网址(超时):
http://goblueraiders.com/boxscore.aspx?path=baseball&id=6117