Question

我一直尝试使用此代码从页面获取价格值的几种不同方式：

    import requests
headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "Accept_Encoding": "gzip, deflate, br",
    "Accept_Language": "en-GB,en-US;q=0.9,en;q=0.8",
    "Connection": "keep-alive",
    "Upgrade_Insecure_Requests": "1",
    "User_Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    }
import csv
from bs4 import BeautifulSoup


#write a CSV file
with open("/Users/eezar/Desktop/reverbsolid.csv","w",newline='') as f:
    writer = csv.writer(f)
    writer.writerow(["Guitar","Price"])
#get the URL of target page
    pages=[]
    for n in range(1,5,1):
        url=("https://reverb.com/marketplace/electric-guitars/solid-body?page={}".format(n))
    #create string for URL
        r = requests.get(url)
    # get the HTML parser
        soup = BeautifulSoup(r.text, "html.parser")
        [s.extract() for s in soup('sup')]
    #identify the parent tag/container for the information
        products = soup.find_all('ul', class_ = 'tiles tiles--four-wide tiles--sidebar-width')
    #loop through container - give a name for the individual component e.g. title.  Text.strip take out the text
        for title in products:
            Guitar = soup.find('img', alt=True)
            Price = soup.find('span',{'class' : 'price-display'}).text.strip()
            #write each line to the CSV using the loop
            print(Guitar)
            writer.writerow ([Guitar,Price])

但是我收到了这个错误：

File "reverbsolid.py", line 32, in <module>
    Price = soup.find('span',{'class' : 'price-display'}).text.strip()
AttributeError: 'NoneType' object has no attribute 'text'

我可以在页面代码中看到文本中的值：

> <span class="price-display"><!-- react-text: 1023 -->$450<!--
> /react-text --></span>

不知道下一步该尝试什么？

Answer 1

这是您用来实现相同结果的代码的略微修改版本：

import requests
import csv
from bs4 import BeautifulSoup


headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "Accept_Encoding": "gzip, deflate, br",
    "Accept_Language": "en-GB,en-US;q=0.9,en;q=0.8",
    "Connection": "keep-alive",
    "Upgrade_Insecure_Requests": "1",
    "User_Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}


# write a CSV file
with open("reverbsolid.csv","w",newline='') as f:
    writer = csv.writer(f)
    writer.writerow(["Guitar","Price"])
    # get the URL of target page
    pages=[]
    for n in range(1,5,1):
        url=("https://reverb.com/marketplace/electric-guitars/solid-body?page={}".format(n))
        # create string for URL
        r = requests.get(url)
        # get the HTML parser
        soup = BeautifulSoup(r.text, "html.parser")
        # get all products
        container = soup.find('ul', class_ = 'tiles tiles--four-wide-max')
        products = container.find_all('li', class_ = 'tiles__tile')
        # loop through container - give a name for the individual component e.g. title.  Text.strip take out the text
        for product in products:
            print(product)
            Guitar = product.find('img', alt=True)
            Price = product.find('span',{'class' : 'price-display'}).text.strip()
            #write each line to the CSV using the loop
            print(Guitar)
            writer.writerow ([Guitar,Price])

结果是：

AttributeError: 'NoneType' object has no attribute 'text'

如果我们研究其中一种产品：

<li class="tiles__tile">
  <div class="grid-card grid-card--placeholder">
    <div class="grid-card__inner">
      <div class="grid-card__main">
    <div class="grid-card__image"></div>
    <div class="grid-card__main__text">
      <div class="grid-card__title"></div>
    </div>
      </div>
      <div class="grid-card__footer">
    <div class="grid-card__footer__pricing">
      <div class="grid-card__price">
      </div>
    </div>
      </div>
    </div>
  </div>
</li>

我们看到没有标题或价格。原因是此页面使用javascript填充DOM，requests不会为您运行javascript，因此您所拥有的都是空白产品位。如果您想要这样的东西，最简单的方法就是使用selenium（https://selenium-python.readthedocs.io/）这样的东西，它将为您运行一个完整的浏览器。

在这种情况下，有一种更简单的方法来获取所需的信息。所有原始项目数据都包含在名为meta的{{1}}标记中。所以：

apollo-state

在这里使用import csv import json import requests from bs4 import BeautifulSoup headers = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", "Accept_Encoding": "gzip, deflate, br", "Accept_Language": "en-GB,en-US;q=0.9,en;q=0.8", "Connection": "keep-alive", "Upgrade_Insecure_Requests": "1", "User_Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36" } # write a CSV file with open("reverbsolid.csv","w",newline='') as f: writer = csv.writer(f) writer.writerow(["Guitar","Price"]) # get the URL of target page pages=[] for n in range(1,5,1): url=("https://reverb.com/marketplace/electric-guitars/solid-body?page={}".format(n)) # create string for URL r = requests.get(url) # get the HTML parser soup = BeautifulSoup(r.text, "html.parser") container = soup.find('meta', {'name': 'apollo-state'}) container = container['content'] container = json.loads(container) # parse products here字典应该能给您带来价格。

注意：我这里将使用container解决方案，因为即使执行速度较慢，它也更加直观。

AttributeError：'NoneType'对象没有属性'text'，但文本在页面上

1 个答案: