Question

这是我用来迭代所有元素的代码：

soup_top = bs4.BeautifulSoup(r_top.text, 'html.parser')

selector = '#ContentPlaceHolder1_gvDisplay table tr td:nth-of-type(3) a'

for link in soup_top.select(selector):
    print(link)

在JavaScript中使用相同的选择器时长度为57：

document.querySelectorAll("#ContentPlaceHolder1_gvDisplay table tr td:nth-of-type(3) a").length;

我想也许我没有正确地获取网页的内容。然后我保存了网页的本地副本，但是Beautiful Soup中的选择器仍然没有选择任何内容。这是怎么回事？

这是website我正在使用代码。

Answer 1

这似乎是由于您使用的parser（即<script src="js/ionic.native.js"></script>）。如果我使用html.parser作为解析器尝试相同的事情：

lxml

from bs4 import BeautifulSoup import requests url = 'http://www.swapnilpatni.com/law_charts_final.php' r = requests.get(url) r.raise_for_status() soup = BeautifulSoup(r.text, 'lxml') css_select = '#ContentPlaceHolder1_gvDisplay table tr td:nth-of-type(3) a' links = soup.select(css_select) print('{} link(s) found'.format(len(links))) >> 1 link(s) found for link in links: print(link['href']) >> spadmin/doc/Company Law amendment 1.1.png会将结果返回到html.parser，即便如此，它也会返回第一个#ContentPlaceHolder1_gvDisplay table tr。

通过W3 Markup Validation Service运行网址时，这是返回的错误：

很抱歉，我无法验证此文档，因为在第1212行它包含一个或多个我无法解释为utf-8的字节（换句话说，找到的字节在指定的字符编码中不是有效值）。请检查文件内容和字符编码指示。错误是：utf8＆＃34; \ xA0＆＃34;不映射到Unicode

tr也可能会对此产生阻塞，而html.parser则更容错。

美丽的汤没有选择任何元素

1 个答案: