我尝试在特定网站上进行网络抓取。但是我无法获取标签。我也可以在Inspect元素和查看页面源代码中看到标签。如何获取标签,请给我任何建议。
WebScrapy.py
from bs4 import BeautifulSoup
from urllib.request import urlopen
import html5lib
import urllib
import pandas as pd
import xlsxwriter
from docx import Document
from docx.shared import Inches
document = Document()
url = "https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Order=BESTMATCH&Description="
Remaining_url="&ignorear=0&N=-1&isNodeId=1"
product_name = 'Seagate 80GB 7200 RPM SATA 3.0Gb/s Internal Hard Drive (IMSourcing) Bare Drive'
p = document.add_paragraph("Product_name " +":"+" "+product_name)
search_words = {'text': product_name}
search_url = urllib.parse.urlencode(search_words).split("=")[1]
product_url = url + search_url + Remaining_url
content = urlopen(product_url).read()
soup = BeautifulSoup(content, "html5lib")
print(soup.find_all("div", class_="list-wrap"))
我运行程序时抛出了空列表。如何修复它,任何人都可以给出任何解决方案。
答案 0 :(得分:0)
是的,没错,结果列表为空。
<div class="result-message">
<p class="result-message-title">
<span class="result-message-error">
We have found 0 items that match "Seagate 80GB 7200 RPM SATA 3.0Gb/s Internal Hard Drive (IMSourcing) Bare Drive".
</span>
</p>
</div>
您可以使用sleep()
在GET请求之间暂停:
time.sleep(1.5)