Question

我试图使用Python来抓取此网页上表格中的数据。

http://www.dividendyieldhunter.com/exchanged-traded-debt-issues-sorted-alphabetically/

我尝试过使用request和bs4。我得到原始HTML但看起来数据是隐藏的。我该怎么办？

Answer 1

该特定页面正在加载此代码中iFrame中URL的数据：

<iframe id="pageswitcher-content" frameborder="0" marginheight="0" marginwidth="0" src="https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&amp;gid=0" style="display: block; width: 100%; height: 100%;"></iframe>

您需要在src属性中的URL处进一步请求HTML：

https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&amp;gid=0

然后你可以用类=＆＃34; waffle＆＃34;刮掉桌子。

注意：请注意来自原始网址的网址查询参数，如下例所示。

例如，靠近末尾的&必须转换为单个＆amp;请求模块的字符，用于查找正确的URL，例如

import requests
res=requests.get("https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&gid=0")
print(res.text)

如何从此网页上的Google文档表中删除数据？

1 个答案: