我正在尝试从网页中获取值。我的python代码当前看起来像这样...
from lxml import html
import requests
if __name__ == "__main__":
page = requests.get('https://www.example.com/example')
tree = html.fromstring(page.content)
print(tree.xpath('//div[@class="previous-crashes"]/text()'))
Here is an example of the html I am trying to get. 因此,从理论上讲,我想要一个包含12.54x,5x,1.06x,12.54x,1.93x的列表。使用当前代码,它始终会打印一个空列表。
答案 0 :(得分:0)
我不确定,但该网站可能有一些防刮擦措施,因此您返回的文件为空。
答案 1 :(得分:0)
您可以尝试:
from bs4 import BeautifulSoup
import requests
req = requests.get("https://domain.tld")
soup = BeautifulSoup(req.text, 'html')
pointers = soup.findall("span", {"class": "pointer"})
for pointer in pointers:
print(pointer.text)
答案 2 :(得分:0)
from lxml import html
import requests
page = requests.get('https://www.example.com/')
doc = html.fromstring(page.content)
elements = doc.find_class('previous-crashes')
for el in elements:
pointers = el.find_class('pointer')
for pointer in pointers:
print(pointer.text_content())
这将为您提供链接的HTML图像中的跨度文本值。