用漂亮的汤从html字符串中提取文本

时间:2020-05-16 22:30:15

标签: python html beautifulsoup

我编写以下代码从网页中提取价格:

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.teleborsa.it/azioni/intesa-sanpaolo-isp-it0000072618-SVQwMDAwMDcyNjE4"
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
prize = soup.select('.h-price')
print(prize)

输出为:

<span class="h-price fc0" id="ctl00_phContents_ctlHeader_lblPrice">1,384</span>

我想提取1,384个值。

2 个答案:

答案 0 :(得分:0)

尝试一下

document.getElementById("ctl00_phContents_ctlHeader_lblPrice").innerText

或者,如果您具有动态元素,则可以遍历每个元素并从中获取innerText。

答案 1 :(得分:0)

您可以使用.text属性来获取所需的文本。

例如:

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.teleborsa.it/azioni/intesa-sanpaolo-isp-it0000072618-SVQwMDAwMDcyNjE4"
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
prize = soup.select_one('.h-price') # <- change to .select_one() to get only one element
print(prize.text)                   # <- use the .text property to get text of the tag

打印:

1,384
相关问题