Question

我正在尝试从工作中的Intranet站点中删除一些数据。我正在测试下面的代码，看起来很不错，但是几乎看起来它要输入错误的URL。如果我右键单击该页面，然后单击“查看页面源代码”，则可以看到我要从中刮取的一堆链接（锚），但是Python实际打印出的内容与我看到的完全不同。 “查看页面源代码”。

from bs4 import BeautifulSoup as bs
import requests
from lxml import html
import urllib.request

REQUEST_URL = 'https://corp-intranet-internal.com/admin/?page=0'
response = requests.get(REQUEST_URL, auth=('fname.lname@gmail.com', 'my_pass'))
xml_data = response.text.encode('utf-8', 'ignore')
html_page = urllib.request.urlopen(REQUEST_URL)
delay = 5 # seconds
soup = bs(html_page, "lxml")
for link in soup.findAll('a'):
    print(link.get('href'))

我使用Selenium测试了相同的想法，但得到的结果与“查看页面源代码”不符。知道这里有什么问题吗？谢谢。

如何强制Python导航到网页并打印所有锚点（a-html）

0 个答案: