使用Python Beautiful Soup在Web抓取中提取价值

时间:2020-11-11 15:55:06

标签: python html css web-scraping beautifulsoup

如何从下面的HTML代码中提取值“ 1.00 TK = 779.8 ”?

我尝试了下面的代码,但是没有用;

from bs4 import BeautifulSoup
page = requests.get(<url>).text

##here is the html page content'''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='driveValue').find_next(text=True).strip())

错误:

 AttributeError: 'NoneType' object has no attribute 'find_next'

2 个答案:

答案 0 :(得分:0)

使用find_next(),它返回第一个匹配项:

from bs4 import BeautifulSoup

html = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id='driveValue').find_next(text=True).strip())

输出:

1.00 TK = 779.8

修改:使用Selenium

from bs4 import BeautifulSoup
from selenium import webdriver
from time import sleep

URL = "https://www.westernunion.com/us/en/web/send-money/start?SrcCode=12345&ReceiveCountry=IN&SendAmount=100&ISOCurrency=CNY&FundsOut=BA&FundsIn=CreditCard"

driver = webdriver.Chrome(r"C:\path\to\chromedriver.exe")
driver.get(URL)
sleep(10)

soup = BeautifulSoup(driver.page_source, "html.parser")

price = driver.find_element_by_css_selector("span.ng-binding.ng-scope").text
print(price)

driver.quit()

输出:

1.00 USD = 73.9375 Indian Rupee (INR)

答案 1 :(得分:-2)

希望它的帮助。

from lxml import etree
txt = '''<span _ngcontent-his-c101="" id="driveValue" class="ng-binding ng-scope"> 1.00 TK = 779.8<span _ngcontent-his-c101="">Disk Drive Value</span>(DDV) </span>'''

root = etree.fromstring(txt)
for td in root.xpath('//span[contains(@class, "ng-binding ng-scope")]'):
    print(td.text)

打印输出

1.00 TK = 779.8