Question

我正在写一个小的网络抓取工具，以从网站的多个页面中检索一些信息。我陷入了以下错误，因为“获取请求”无法检索整个HTML代码：

Traceback (most recent call last):
    name = soup.find("div",class_="productTitle").text
AttributeError: 'NoneType' object has no attribute 'text'

这是代码段：

URL_file = open('Continente_Links.txt', 'r+')
URL_list = URL_file.readlines()

for item in URL_list:

        results = requests.get(item, timeout=2.50, stream=True)
        print(results.status_code == requests.codes.ok)
        print(results.text)
        print(results.headers)

        soup = BeautifulSoup(results.text, "html.parser")

        #name
        name = soup.find("div",class_="productTitle").text
        Title.append(name)

Continente_Links.txt文件中包含的第一个链接

为什么“获取请求”未获取完整的HTML代码？

Answer 1

我认为是因为js。我用硒制作了废料代码：

from selenium import webdriver

Title = []

firefox_options = webdriver.FirefoxOptions()
firefox_options.add_argument('--headless') # comment this for turn off headless mode
driver = webdriver.Firefox(executable_path='geckodriver.exe', firefox_options=firefox_options)

driver.get("https://www.continente.pt/stores/continente/pt-pt/public/Pages/ProductDetail.aspx?ProductId=4040367(eCsf_RetekProductCatalog_MegastoreContinenteOnline_Continente)")
name = driver.find_element_by_class_name('productTitle').text
Title.append(name)
print(Title)
driver.close()

无法从运行Python请求的网页获取完整的html获取

1 个答案: