我正在写一个小的网络抓取工具,以从网站的多个页面中检索一些信息。 我陷入了以下错误,因为“获取请求”无法检索整个HTML代码:
Traceback (most recent call last):
name = soup.find("div",class_="productTitle").text
AttributeError: 'NoneType' object has no attribute 'text'
这是代码段:
URL_file = open('Continente_Links.txt', 'r+')
URL_list = URL_file.readlines()
for item in URL_list:
results = requests.get(item, timeout=2.50, stream=True)
print(results.status_code == requests.codes.ok)
print(results.text)
print(results.headers)
soup = BeautifulSoup(results.text, "html.parser")
#name
name = soup.find("div",class_="productTitle").text
Title.append(name)
Continente_Links.txt文件中包含的第一个链接
为什么“获取请求”未获取完整的HTML代码?
答案 0 :(得分:0)
我认为是因为js。我用硒制作了废料代码:
from selenium import webdriver
Title = []
firefox_options = webdriver.FirefoxOptions()
firefox_options.add_argument('--headless') # comment this for turn off headless mode
driver = webdriver.Firefox(executable_path='geckodriver.exe', firefox_options=firefox_options)
driver.get("https://www.continente.pt/stores/continente/pt-pt/public/Pages/ProductDetail.aspx?ProductId=4040367(eCsf_RetekProductCatalog_MegastoreContinenteOnline_Continente)")
name = driver.find_element_by_class_name('productTitle').text
Title.append(name)
print(Title)
driver.close()