Question

我是python的新手，正在构建一个webscraper。我想要整个网页上第二个“跨度”的所有实例。我的目的是获取所有汽车品牌名称（例如：日产）和汽车型号名称（例如：Pathfinder）

但是我不知道如何掌握所有的汽车模型。我已经尝试过建立索引，但是无法建立一个给出所有模型名称的循环。

下面是我要从中获取名称的页面html。

   <h3 class="brandModelTitle">
    <span class="txtGrey3">NISSAN</span>
    <span class="txtGrey3">PATHFINDER</span>

    <span class="version txtGrey7C noBold">(2)
    2.5 DCI 190 LE 7PL EURO5</span>

    </h3>

下面是我用来查找所有品牌名称的代码名称= []

Prices_Cars = []
for var1 in soup.find_all('h3', class_ = 'brandModelTitle'):
    brand_Names = var1.span.text
    Names.append(brand_Names)

Answer 1

soup.find_all('h3', class_ = 'brandModelTitle')仅返回h3，您应拦截每个h3以查找所有跨度。

尝试一下：

from bs4 import BeautifulSoup

str = """
   <h3 class="brandModelTitle">
    <span class="txtGrey3">NISSAN</span>
    <span class="txtGrey3">PATHFINDER</span>

    <span class="version txtGrey7C noBold">(2)
    2.5 DCI 190 LE 7PL EURO5</span>

    </h3>
"""

soup = BeautifulSoup(str,'html5lib')

result = []
for var1 in soup.find_all('h3', class_ = 'brandModelTitle'):
    dic = {}
    spans = var1.find_all('span', class_ = 'txtGrey3')
    dic["Brands"]=spans[0].get_text()
    dic["model"]=spans[1].get_text()
    result.append(dic)

Answer 2

您可以使用scrapy，我只包括parse函数部分：

def parse(self, response):
    #Remove XML namespaces
    response.selector.remove_namespaces()

    #Extract article information
    brands = response.xpath('//h3/span[1]/text()').extract()
    models = response.xpath('//h3/span[2]/text()').extract()
    details = response.xpath('//h3/span[3]/text()').extract()


    for item in zip(brands,models,details):
        scraped_info = {
            'brand' : item[0],
            'model' : item[1],
            'details' : item[2]
        }

        yield scraped_info

草率信息：https://www.analyticsvidhya.com/blog/2017/07/web-scraping-in-python-using-scrapy/ xpath示例：https://www.w3schools.com/xml/xpath_examples.asp

如何抓住第二个“跨度”？（用python制作网页抓取工具）

2 个答案:

如何抓住第二个“跨度”？ （用python制作网页抓取工具）

2 个答案:

如何抓住第二个“跨度”？（用python制作网页抓取工具）