我一直尝试使用此代码从页面获取价格值的几种不同方式:
import requests
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept_Encoding": "gzip, deflate, br",
"Accept_Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Connection": "keep-alive",
"Upgrade_Insecure_Requests": "1",
"User_Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}
import csv
from bs4 import BeautifulSoup
#write a CSV file
with open("/Users/eezar/Desktop/reverbsolid.csv","w",newline='') as f:
writer = csv.writer(f)
writer.writerow(["Guitar","Price"])
#get the URL of target page
pages=[]
for n in range(1,5,1):
url=("https://reverb.com/marketplace/electric-guitars/solid-body?page={}".format(n))
#create string for URL
r = requests.get(url)
# get the HTML parser
soup = BeautifulSoup(r.text, "html.parser")
[s.extract() for s in soup('sup')]
#identify the parent tag/container for the information
products = soup.find_all('ul', class_ = 'tiles tiles--four-wide tiles--sidebar-width')
#loop through container - give a name for the individual component e.g. title. Text.strip take out the text
for title in products:
Guitar = soup.find('img', alt=True)
Price = soup.find('span',{'class' : 'price-display'}).text.strip()
#write each line to the CSV using the loop
print(Guitar)
writer.writerow ([Guitar,Price])
但是我收到了这个错误:
File "reverbsolid.py", line 32, in <module>
Price = soup.find('span',{'class' : 'price-display'}).text.strip()
AttributeError: 'NoneType' object has no attribute 'text'
我可以在页面代码中看到文本中的值:
> <span class="price-display"><!-- react-text: 1023 -->$450<!--
> /react-text --></span>
不知道下一步该尝试什么?
答案 0 :(得分:0)
这是您用来实现相同结果的代码的略微修改版本:
import requests
import csv
from bs4 import BeautifulSoup
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept_Encoding": "gzip, deflate, br",
"Accept_Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Connection": "keep-alive",
"Upgrade_Insecure_Requests": "1",
"User_Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}
# write a CSV file
with open("reverbsolid.csv","w",newline='') as f:
writer = csv.writer(f)
writer.writerow(["Guitar","Price"])
# get the URL of target page
pages=[]
for n in range(1,5,1):
url=("https://reverb.com/marketplace/electric-guitars/solid-body?page={}".format(n))
# create string for URL
r = requests.get(url)
# get the HTML parser
soup = BeautifulSoup(r.text, "html.parser")
# get all products
container = soup.find('ul', class_ = 'tiles tiles--four-wide-max')
products = container.find_all('li', class_ = 'tiles__tile')
# loop through container - give a name for the individual component e.g. title. Text.strip take out the text
for product in products:
print(product)
Guitar = product.find('img', alt=True)
Price = product.find('span',{'class' : 'price-display'}).text.strip()
#write each line to the CSV using the loop
print(Guitar)
writer.writerow ([Guitar,Price])
结果是:
AttributeError: 'NoneType' object has no attribute 'text'
如果我们研究其中一种产品:
<li class="tiles__tile">
<div class="grid-card grid-card--placeholder">
<div class="grid-card__inner">
<div class="grid-card__main">
<div class="grid-card__image"></div>
<div class="grid-card__main__text">
<div class="grid-card__title"></div>
</div>
</div>
<div class="grid-card__footer">
<div class="grid-card__footer__pricing">
<div class="grid-card__price">
</div>
</div>
</div>
</div>
</div>
</li>
我们看到没有标题或价格。原因是此页面使用javascript填充DOM,requests
不会为您运行javascript,因此您所拥有的都是空白产品位。如果您想要这样的东西,最简单的方法就是使用selenium
(https://selenium-python.readthedocs.io/)这样的东西,它将为您运行一个完整的浏览器。
在这种情况下,有一种更简单的方法来获取所需的信息。所有原始项目数据都包含在名为meta
的{{1}}标记中。所以:
apollo-state
在这里使用import csv
import json
import requests
from bs4 import BeautifulSoup
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept_Encoding": "gzip, deflate, br",
"Accept_Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Connection": "keep-alive",
"Upgrade_Insecure_Requests": "1",
"User_Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}
# write a CSV file
with open("reverbsolid.csv","w",newline='') as f:
writer = csv.writer(f)
writer.writerow(["Guitar","Price"])
# get the URL of target page
pages=[]
for n in range(1,5,1):
url=("https://reverb.com/marketplace/electric-guitars/solid-body?page={}".format(n))
# create string for URL
r = requests.get(url)
# get the HTML parser
soup = BeautifulSoup(r.text, "html.parser")
container = soup.find('meta', {'name': 'apollo-state'})
container = container['content']
container = json.loads(container)
# parse products here
字典应该能给您带来价格。
注意:我这里将使用container
解决方案,因为即使执行速度较慢,它也更加直观。