Question

我正在使用以下代码段，并尝试从下面的链接解析html的一部分，即div如下所示：

<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>

这是我尝试使用的代码

import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')

avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})

print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))

但是，它不包括div之间的价格...结果如下：

<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>

有什么想法吗？我确定我正在忽略一些小东西，但是我现在机智了。哈哈。

Answer 1

当然可以，但是仅当不需要使用javascrip计算数据时。就是现在！在此网站中，您可以使用fiddler找出javascrip用来加载数据的URL，然后可以从中获取json或其他名称。这是一个简单的示例，在我使用提琴手找出数据来自何处之后。请记住，使用提琴手证书时需要设置verify=False。

import requests 

with requests.Session() as se:
    se.headers = {
        "X-Requested-With": "XMLHttpRequest",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
        "Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Accept-Encoding":"gzip, deflate, br",
        }
    data = [
        "search=tatuaje+tattoo",
        "types=",
        "sites=",
    ]

    cookies = {
        "Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}

    url = "https://herf.io/bids/search/open"

    price = "https://herf.io/bids/search/stats"

    req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
    print(req.text)

输出

{“ bottomQuarter”：4.4，“ topQuarter”：3.31，“ median”：3.8，“ mean”：4.03，“ stddev”：1.44，“ moe”：0.08，“ good”：2.59，“ great”： 1.14，“差”：5.47，“差”：6.91，“最佳”：0.2，“最差”：15.98，“计数”：1121}

Answer 2

您可以使用text属性删除文本：

print("Average Price: {}".format(avgPrice.text))
print("Low Price: {}".format(lowPrice.text))
print("High Price: {}".format(highPrice.text))

Answer 3

尝试

avgPrice[0].text

对于其余部分，请执行相同操作。

使用BeautifulSoup从div提取文本

3 个答案: