Question

我正在尝试从here抓取诸如资产类别，类别，潜在风险（数字而不是链接中显示的图像）之类的信息。网页加载后的源代码显示为

<div data-ng-class="{layerLinkRight : data.isPriceYieldSecLayerLink, wraper : isETF, secYieldDataWrapper : !data.isLayer &amp;&amp; data.codeIsLayer &amp;&amp; isETF}" class="wraper">
                    <!-- ngIf: !data.isLayer --><span data-ng-if="!data.isLayer" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" class="ng-scope ng-binding sceIsLayer arrangeSec">Asset class</span><!-- end ngIf: !data.isLayer -->
                    <!-- ngIf: data.isLayer -->
                    <!-- ngIf: !data.codeIsLayer --><span data-ng-if="!data.codeIsLayer" data-ng-class="{sceIsLayer : isETF}" data-ng-bind-html="data.codeValue" class="ng-scope ng-binding sceIsLayer"></span><!-- end ngIf: !data.codeIsLayer -->
                    <!-- ngIf: data.codeIsLayer -->
                </div>

我从获取图像的基础开始，并尝试使用代码捕获“资产类别”值

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import os
url="https://investor.vanguard.com/etf/profile/BNDW/"
#web_r=requests.get(url)
#web_soup=BeautifulSoup(web_r.text,'html.parser')
#<img src=''/

driver = webdriver.Firefox()#executable_path=r'/home/suraj/Documents/python-virtual-environments/stock_analysis/Files/geckodriver.exe')
driver.get(url)#"http://www.chrisburkard.com/")
html=driver.execute_script("return document.documentElement.outerHTML")
sel_soup=BeautifulSoup(html,'html.parser')
print(len(sel_soup.findAll("img")))
#Getting sample Images
images=[]
for i in sel_soup.findAll("img"):
    #print(i)
    #print(dir(i))
    src = i["src"]
    images.append(src)
print(images)
asset_class=[]
#Getting Asset Class
for i in sel_soup.findAll("span data-ng-if"):
    a_class=i["data.codeValue"]
    asset_class.append(a_class)
print(asset_class)

asset_class的输出为空。

包装器中存在抓取动态内容

0 个答案: