使用具有特定标题的BeautifulSoup查找特定表

时间:2017-07-03 21:06:22

标签: python beautifulsoup

我试图在给定的HTML中找到下面的特定表:

<table class="sidearm-table collapse-on-medium accordion" accordion-table="" sortable-table="">
                        <caption>Tennessee Tech<span class="hide"> - Pitching Stats</span></caption>

我的方法是找到标题,然后继续查找父表,我将从中迭代这些行以找到我想要的文本(我可以自己完成这部分)。我相信我的错误隐藏在标题文本继续进入span标记的事实中,但不确定是否是这种情况。下面给出了我这样做的代码,但是它继续返回None,因为它找不到表(我的语法可能不正确):

from bs4 import BeautifulSoup
import re
import requests

header = {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'}

redirect = requests.get('http://goblueraiders.com/boxscore.aspx?path=baseball&id=6117', headers = header).text
soup = BeautifulSoup(redirect, 'html.parser')

test = soup.find('caption', text = 'Tennessee Tech').find_parent('table', {'class': 'sidearm-table collapse-on-medium accordion'})

2 个答案:

答案 0 :(得分:2)

我会尝试查找所有字幕,然后匹配标题文字,如下所示:

from bs4 import BeautifulSoup
import re
import requests


header = {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'}

redirect = requests.get('http://goblueraiders.com/boxscore.aspx?path=baseball&id=6117', headers = header).text
soup = BeautifulSoup(redirect, 'html.parser')

for caption in soup.find_all('caption'):
    if caption.get_text() == 'Tennessee Tech - Pitching Stats':
        table = caption.find_parent('table', {'class': 'sidearm-table collapse-on-medium accordion'})

答案 1 :(得分:0)

执行:

from bs4 import BeautifulSoup


html = """
<table class="sidearm-table collapse-on-medium accordion" accordion-table="" sortable-table="">
<caption>
Tennessee Tech
<span class="hide"> - Pitching Stats</span>
</caption>
</table>
"""

soup = BeautifulSoup(html, 'html.parser')

table = soup.find('table', {'class': 'sidearm-table'})

print(table.contents)

输出:

['\n', <caption>
Tennessee Tech
<span class="hide"> - Pitching Stats</span>
</caption>, '\n']

但找不到您的网址(超时):

http://goblueraiders.com/boxscore.aspx?path=baseball&id=6117