Question

无论如何都要获取页面的所有请求的网址？例如，当我访问某个站点时，浏览器会向服务器发出多个网络请求，例如，显示页面所需的html，css和其他文件。我想使用Python获取所有这些资产URL。基本上，我想从下面的截图中获取所有网址。有人能指出我正确的方向吗？

PS：我想用脚本来自动完成任务。我知道我可以使用Wireshark来完成这项任务，而我无法实现自动化。

https://github.com/angular/material/blob/master/src/components/panel/panel.js#L465-L497

Answer 1

对于其他感兴趣的人：我从a post from pythoncode获得的下一个代码段对我有用。

我确信它具有如上所述的一些限制（不同的浏览器，不同的路径等，使这种黑客行为变得不那么普遍），但它可能会节省一天的时间。

# get the CSS files
css_files = []

for css in soup.find_all("link"):
    if css.attrs.get("href"):
        # if the link tag has the 'href' attribute
        css_url = urljoin(url, css.attrs.get("href"))
        css_files.append(css_url)
print(css_files) # list of URLs of the files called

在Python中获取页面的所有请求URL

1 个答案: