Question

我正在使用bs4和python并尝试从网页中获取数据。 Link我在我想要的信息上使用了inspect元素，但两者都有相同的标记，类。

             <a class="cell__value" data-tracker-action="click" data-tracker-label="information_technology.01" href="/markets/sectors/information-technology">
             Information Technology
            </a>
           </div>
           <div class="cell__return">
            <div class="cell__label">
             % Price Change
            </div>
            <div class="cell__value" data-type="better">
             +0.05%
            </div>
           </div>
          </div>
          <div class="cell">
           <div class="cell__name">
            <div class="cell__label">
             Industry
            </div>
            <a class="cell__value" data-tracker-action="click" data-tracker-label="information_technology.02" href="/markets/sectors/information-technology">
             Software &amp; Services
            </a>
           </div>
           <div class="cell__return">
            <div class="cell__label">
             % Price Change
            </div>
            <div class="cell__value" data-type="worse">
             -0.04%
            </div>
           </div>
          </div>
         </div>

我这样做：

sect= soup.find("a",{"data-tracker-label":"information_technology.01"})
print sect.text
sect_per= soup.find("div",{"data-type":"worse"or"better"})
print sect_per.text
ind=soup.find("a",{"data-tracker-label":"information_technology.02"})
print ind.text
ind_per=soup.find("div",{"div",{"data-type":"worse"or"better"})
print ind_per

打印ind_per 和 打印ind_per 因同一类而给我相同的结果代码

我需要分别提取 +0.05％ 和 -0.04％ 。

请建议我这样做。

Answer 1

soup = BeautifulSoup(example, "html.parser")

for cell in soup.find_all("div", class_="cell"): 
    name = ""
    namecell = cell.find("a", class_="cell__value", text=True)
    if namecell is not None:
         name = namecell.get_text(strip=True)
    price_chage = cell.find("div", class_="cell__value").get_text(strip=True)
    print ( "%s: Price Change:  %s" % (name, price_chage))

哪个输出：

信息技术：价格变动：+ 0.05％

软件＆amp;服务：价格变动：-0.04％

您可以保存这些值以便进一步处理。

Answer 2

如果左操作数是真值，则

or返回左操作数（字符串为非空字符串）：

>>> "worse" or "better"
'worse'

所以，以下一行：

ind_per = soup.find("div",{"div",{"data-type":"worse" or "better"})

与...基本相同：

ind_per = soup.find("div",{"div",{"data-type":"worse"})

您需要单独查询它们：

ind_per = soup.find("div",{"div",{"data-type": "worse"})
print ind_per
ind_per = soup.find("div",{"div",{"data-type": "better"})
print ind_per

或使用for循环：

for data_type in ('worse', 'better'):
    ind_per = soup.find("div",{"div",{"data-type": data_type})
    print ind_per

Answer 3

<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
**<span data-value="2333089" name="nv">2,333,089</span>**
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
**<span data-value="28,341,469" name="nv">$28.34M</span>**
</p>

“我想获得电影的投票数和总票数但两者都具有相同的名称“nv”，因此我们对它们使用索引“

 vote_mov=container.findAll("span",{"name":"nv"})
    vote=vote_mov[0].text
    
    gross_mov=container.findAll("span",{"name":"nv"})
    gross=gross_mov[1].text

“这里是第一次投票，然后票房enter image description here

Web Scraping，如何使用python中的bs4从两个相同的标签中提取数据

3 个答案: