Question

我正尝试通过BeautifulSoup从以下网页中获取一些信息：

url = 'https://web.archive.org/web/20071001215911/http://finance.rambler.ru'

借助浏览器（Chrome），我复制了所需元素的选择器：

selector = 'body > div.fe_global > table:nth-child(6) > tbody > tr > td:nth-child(2) > table > tbody > tr > td.fe_col-left > div:nth-child(5) > table > tbody'

但是，bs4不支持nth-child，因此我将其替换为nth-of-type：

selector = selector.replace('child', 'of-type')

将其涂在汤上

r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
selected_element = soup.select(selector=selector)

print (selected_element)

输出为[]。我希望得到一些HTML代码。这样回答的原因是什么？谢谢您的帮助。

Answer 1

在选定的div中有2个表，我将选择第二个表

from bs4 import BeautifulSoup
import requests

url = 'https://web.archive.org/web/20071001215911/http://finance.rambler.ru'
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0'}
r = requests.get(url, headers=heads)
soup = BeautifulSoup(r.text, 'html.parser')
selected_element = soup.select('div[class="fe_small fe_l2"] table')[1]

print (selected_element)

将nth-child替换为nth-of-type会产生意外错误

1 个答案: