打印特定行(Beautifulsoup)

时间:2016-06-16 13:01:13

标签: python beautifulsoup

目前,我的代码正在解析链接并打印网站上的所有信息。我只想从网站 打印 一个特定的行。我怎么能这样做?

这是我的代码:

from bs4 import BeautifulSoup
import urllib.request

r = urllib.request.urlopen("Link goes here").read()
soup = BeautifulSoup(r, "html.parser")

# This is what I want to change. I currently have it printing everything.
# I just want a specific line from the website

print (soup.prettify())

2 个答案:

答案 0 :(得分:3)

不要使用漂亮的打印来尝试解析tds,具体选择标签,如果属性是唯一的,那么使用它,如果类名是唯一的,那么只需使用:

td = soup.select_one("td.content")
td = soup.select_one("td[colspan=3]")

如果是第四个td:

td = soup.select_one("td:nth-of-type(4)")

如果它在特定的表中,那么选择表,然后在表中找到td,尝试将html拆分为行,索引实际上比使用regex to parse html更糟糕。

您可以使用td之前的粗体标记中的文本获取特定的td,即财务部门建筑分类:

In [19]: from bs4 import BeautifulSoup

In [20]: import urllib.request

In [21]: url = "http://a810-bisweb.nyc.gov/bisweb/PropertyProfileOverviewServlet?boro=1&houseno=1&street=park+ave&go2=+GO+&requestid=0"

In [22]: r = urllib.request.urlopen(url).read()

In [23]: soup = BeautifulSoup(r, "html.parser")

In [24]: print(soup.find("b",text="Department of Finance Building Classification:").find_next("td").text)
O6-OFFICE BUILDINGS

选择第n个表和行:

In [25]: print(soup.select_one("table:nth-of-type(8) tr:nth-of-type(5) td[colspan=3]").text)
O6-OFFICE BUILDINGS

答案 1 :(得分:1)

{u'id': u'[redacted]', u'name': u'[redacted]'}