尝试使用下面的python bs4脚本抓取以下html。不断收到错误(在下面列出)。不知道是什么原因造成的?如果有人可以帮助我弄清楚如何使其工作,那将是很棒的事情!
<span id="prodInfoPriceVat" class="prodInfoPriceVat" data-price-vat="24.73">£24.73</span>
Python BS4脚本:
prices = {
"GLDAG_MAPLE": {"url": "https://www.gold.co.uk/silver-coins/candian-silver-maple-coins/1oz-canadian-maple-silver-coin-2020/",
"trader": "Gold.co.uk",
"metal": "Silver",
"type": "Maple"},
"BBPAG_MAPLE": {"url": "https://www.bullionbypost.co.uk/silver-coins/canadian-maple-1oz-silver-coin/2019-1oz-canadian-maple-silver-coin/",
"trader": "Bullion By Post",
"metal": "Silver",
"type": "Maple"},
"ATKAG_BRITANNIA": {"url": "https://atkinsonsbullion.com/silver/silver-coins/1oz-silver-coins/2020-uk-britannia-1oz-silver-coin",
"trader": "Atkinsons Bullion",
"metal": "Silver",
"type": "Britannia"},
}
response = requests.get(
'https://www.bullionbypost.co.uk/silver-price/silver-price-per-gram/')
soup = BeautifulSoup(response.text, 'html.parser')
AG_GRAM_SPOT = soup.find(
'span', {'name': 'current_price_field'}).get_text()
# Convert to float
AG_GRAM_SPOT = float(re.sub(r"[^0-9\.]", "", AG_GRAM_SPOT))
# No need for another lookup
AG_OUNCE_SPOT = AG_GRAM_SPOT * 31.1035
for coin in prices:
response = requests.get(prices[coin]["url"])
soup = BeautifulSoup(response.text, 'html.parser')
try:
text_price = soup.find(
'td', {'id': 'price-inc-vat-per-unit-1'}).get_text() # BullionByPost
except:
text_price = soup.find(
'td', {'id': 'total-price-inc-vat-1'}).get_text() # Gold.co.uk
else:
text_price = soup.find(
'span', {'class': 'prodInfoPriceVat'}).get_text() # Issues here!Line 70
# Grab the number
prices[coin]["price"] = float(re.sub(r"[^0-9\.]", "", text_price))
继续收到此错误:如何解决?
Traceback (most recent call last):
File "scraper.py", line 70, in <module>
text_price = soup.find(
AttributeError: 'NoneType' object has no attribute 'get_text'
我该如何工作?
答案 0 :(得分:1)
这里不需要使用异常,只需使用if..else
并测试找到的元素是否不是None
。
例如:
import re
import requests
from bs4 import BeautifulSoup
prices = {
"GLDAG_MAPLE": {"url": "https://www.gold.co.uk/silver-coins/candian-silver-maple-coins/1oz-canadian-maple-silver-coin-2020/",
"trader": "Gold.co.uk",
"metal": "Silver",
"type": "Maple"},
"BBPAG_MAPLE": {"url": "https://www.bullionbypost.co.uk/silver-coins/canadian-maple-1oz-silver-coin/2019-1oz-canadian-maple-silver-coin/",
"trader": "Bullion By Post",
"metal": "Silver",
"type": "Maple"},
"ATKAG_BRITANNIA": {"url": "https://atkinsonsbullion.com/silver/silver-coins/1oz-silver-coins/2020-uk-britannia-1oz-silver-coin",
"trader": "Atkinsons Bullion",
"metal": "Silver",
"type": "Britannia"},
}
response = requests.get(
'https://www.bullionbypost.co.uk/silver-price/silver-price-per-gram/')
soup = BeautifulSoup(response.text, 'html.parser')
AG_GRAM_SPOT = soup.find(
'span', {'name': 'current_price_field'}).get_text()
# Convert to float
AG_GRAM_SPOT = float(re.sub(r"[^0-9\.]", "", AG_GRAM_SPOT))
# No need for another lookup
AG_OUNCE_SPOT = AG_GRAM_SPOT * 31.1035
for coin in prices:
print('url=', prices[coin]["url"])
response = requests.get(prices[coin]["url"])
soup = BeautifulSoup(response.text, 'html.parser')
text_price = soup.find(
'td', {'id': 'price-inc-vat-per-unit-1'}) # BullionByPost
if not text_price:
text_price = soup.find(
'td', {'id': 'total-price-inc-vat-1'}) # Gold.co.uk
if not text_price:
text_price = soup.find(
'span', {'class': 'prodInfoPriceVat'}) # atkinsonsbullion.com
if not text_price:
print('Error, unable to fint price for url=', prices[coin]["url"])
prices[coin]["price"] = float('nan')
continue
text_price = text_price.get_text(strip=True)
# Grab the number
prices[coin]["price"] = float(re.sub(r"[^0-9\.]", "", text_price))
print('price=', prices[coin]["price"])
打印:
url= https://www.gold.co.uk/silver-coins/candian-silver-maple-coins/1oz-canadian-maple-silver-coin-2020/
price= 31.32
url= https://www.bullionbypost.co.uk/silver-coins/canadian-maple-1oz-silver-coin/2019-1oz-canadian-maple-silver-coin/
price= 26.88
url= https://atkinsonsbullion.com/silver/silver-coins/1oz-silver-coins/2020-uk-britannia-1oz-silver-coin
price= 24.73