Question

我正在尝试从此website

中提取主题列表

我的代码到目前为止

import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://www.fda.gov/ICECI/EnforcementActions/WarningLetters/2017/default.htm'
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
soup.prettify()
print soup.find_all("table", id = "WarningLetter_sortid")

我可以找到该表，但我似乎无法弄清楚如何提取主题列。谢谢你的帮助！

Answer 1

您可以找到表中的所有行（第一个标题行除外），并为每一行获取第四个td元素：

rows = soup.select("table#WarningLetter_sortid tr")[1:]
print([row('td')[3].get_text(strip=True) for row in rows])

您也可以使用nth-of-type() CSS选择器一次性完成：

print([row.get_text(strip=True) 
       for row in soup.select("table#WarningLetter_sortid tr > td:nth-of-type(4)")])

如何使用BeautifulSoup从表中拉出单个列

1 个答案: