我刚开始使用Python,但是我有一种奇怪的行为,那就是Python大部分时间都会给我一个错误,有时它可以正确地编译我的代码。
import requests
from bs4 import BeautifulSoup
jblCharge4URL = 'https://www.amazon.de/JBL-Charge-Bluetooth-Lautsprecher-Schwarz-integrierter/dp/B07HGHRYCY/ref=sr_1_2_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&keywords=jbl+charge+4&qid=1562775856&s=gateway&sr=8-2-spons&psc=1'
def get_page(url):
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
return soup
def get_product_name(url):
soup = get_page(url)
try:
title = soup.find(id="productTitle").get_text()
print("SUCCESS")
except AttributeError:
print("ERROR")
while(True)
print(get_product_name(jblCharge4URL))
控制台输出:
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
**SUCCESS**
None
ERROR
None
**SUCCESS**
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
ERROR
None
预先感谢
答案 0 :(得分:0)
我对您的代码做了一些调整,这应该使您回到正确的轨道上
import requests
from bs4 import BeautifulSoup
jblCharge4URL = 'https://www.amazon.de/JBL-Charge-Bluetooth-Lautsprecher-Schwarz- integrierter/dp/B07HGHRYCY/ref=sr_1_2_sspa?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91& keywords=jbl+charge+4&qid=1562775856&s=gateway&sr=8-2-spons&psc=1'
def get_page(url):
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
return soup
def get_product_name(url):
soup = get_page(url)
try:
title = soup.find(id="productTitle")
print("SUCCESS")
except AttributeError:
print("ERROR")
return(title)
print(get_product_name(jblCharge4URL))
答案 1 :(得分:0)
您在for m in matches:
l = " ".join(words[m-15:m])
i = 1
while i < 16:
if (words[m+i].lower() == word):
i=1
else:
l.join(words[m+(i++)])
f.write(f"...{l}...") #writes the data to the external file
f.write(os.linesep)
中使用什么bExtend = false;
for m in matches:
if (!bExtend):
l = " ".join(words[m-15:m])
f.write("...")
bExtend = false
i = 1
while (i < 16):
if (words[m].lower() == word):
l.join(words[m+i])
bExtend = true
break
else:
l.join(words[m+(i++)])
f.write(l)
if (!bExtend):
f.write("...")
f.write(os.linesep)
?
您可能想要使服务器相信您是真实用户而不是脚本的内容。我建议使用
headers
此外,在调试此问题时,您可能希望在异常中打印变量page = requests.get(url, headers=headers)
的值。打印headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
将为您提供页面的HTML,然后您可以在源代码中进行挖掘以了解问题所在。
答案 2 :(得分:0)
除了使用requests
和BeautifulSoup
组合以外,您还可以使用requests-html包下载网页并同时解析内容。使用request-html的示例为:
from requests_html import HTMLSession
url = r"https://www.amazon.de/JBL-Charge-Bluetooth-Lautsprecher-Schwarz-integrierter/dp/B07HGHRYCY/"
req = HTMLSession().get(url)
product_title = req.html.find("#productTitle", first=True)
print(product_title.text) #JBL Charge 4 Bluetooth-Lautsprecher in Schwarz – Wasserfeste, portable Boombox mit integrierter Powerbank – Mit nur einer Akku-Ladung bis zu 20 Stunden kabellos Musik streamen
希望有帮助。