Question

我有一个网站，但我不想使用外部网站使用bs4从div标签中提取文本。这是一个烧瓶网站

#Importing librarys 
from flask import Flask, render_template 
import sys
import json
import requests
import urllib.request
import time
from bs4 import BeautifulSoup


#Importing files and class from other python files in the project
sys.path.append('.')
from webScrape import getInformation

#Making a new app instance
app = Flask(__name__)

#Saying if the app is on route / the open index.html
@app.route('/')
def index():
    URL = 'https://covidstat.info/home'

    HTML = requests.get(URL)
    soup = BeautifulSoup(HTML.text, "html.parser")
    tag = soup.findAll('div', {'class': 'count'})
    print(tag.text)
    return render_template('index.html', tag=tag)

#Running the app on port 5000
if __name__== '__main__':
    app.run(debug=True, host='0.0.0.0',)

哦，还有另一个问题，有人知道我如何在bs4中使用xpath获取元素

Answer 1

在这种情况下，使用soup.findAll将返回一个div列表。因此，您必须循环访问它们。您还可以像这样使用列表理解：

tag_text = [t.text for t in tag]

哪个返回： ['2,735,342', '2,025,878', '329,757', '442', '4', '2,615,920']

或者，您可以改用soup.find，它只返回第一个div，您可以直接通过tag.text访问它，得到'2,735,342'。

通过xpath获取元素就是使用检查器，方法是右键单击所需的文本-> Inspect Element->右键单击div标签-> Copy-> { {1}}。

以前使用的数字的xpath为：

XPath

据我所知，BS4不支持xpath选择，因此您必须更改到另一个库。我知道Selenium支持它，但可能不是该任务的最佳用例。

使用bs4 python从html中的div标签获取文本

1 个答案: