Question

我正在尝试find_all或使用Beautiful Soup从网站中选择4个属性名称。

我是python的新手，在其他地方找不到这个答案。我正在使用Python 3.4.3和最新的Beautiful Soup。

td是网站html的元素属性是text，linkb，pnum和num

这是我的代码：

r = requests.get(url)           #downloads website html

soup = BeautifulSoup(r.content)         #soup calls the data

v_data = soup.select('.text', '.pnum', '.num', '.linkb') #these are the names of the attributes
for symbol in v_data:
        print(symbol.text)

如果我使用该代码，则无效。

我无法搜索属性td或任何其他单个属性，因为它会从网站中找到或选择我不想要的许多其他值。

v_data = soup.select('.text')

这有效，但不会返回其他属性名称。

请帮忙。

谢谢，

Answer 1

BeautifulSoup's CSS Selector support is limited.

如何使用lxml的better css selector support。

import lxml.html

r = requests.get(url)
root = lxml.html.fromstring(r.text)
v_data = root.cssselect('.text, .pnum, .num, .linkb')
for symbol in v_data:
    print(symbol.text)

根据OP的评论

更新：

您可以选择包含这些行的表，并迭代这些行，而不是指定所有td的类：

table = soup.select('table.mdcTable')[0]
for row in table.select('tr')[1:]:
    print [td.text.strip() for td in row.select('td')[1:]]

如何使用Beautiful Soup find_all或选择多个属性名称？

1 个答案: