Question

BeautifulSoup之后如何获取页面的URL？

res = requests.get('http://www.example.com')
soup = BeautifulSoup(res.text, 'lxml')

如何从汤中获取http://www.example.com？

Answer 1

尝试一下：

soup.url = 'http://www.example.com'

将soup传递给函数后，可以调用soup.url获得http://www.example.com。

Answer 2

您可以通过以下方式从请求对象获取网址：

res = requests.get('http://www.example.com')
soup = BeautifulSoup(res.text, 'lxml')
res.url

BeautifulSoup是一个标记解析器，因此它仅知道传递给它的res.text的html。如果在网站的某个位置使用了网站URL，则可以使用BeautifulSoup来解析相应的元素并获取该URL。

尽管如此，它远非最佳方法。

Answer 3

如果网页上有link标签，则可以使用来获取网址

link = soup.find('link')
print (link['href'])

否则，您将无法使用BeautifulSoup获得url（如果它不在任何html标签中）。在这种情况下，请像上面的@Simas一样使用res.url或使用request.Request（实际上与{ {1}}，但用法不同），如：

res.url