我正在使用以下代码段,并尝试从下面的链接解析html的一部分,即div如下所示:
<div id="avg-price" class="price big-price">4.02</div>
<div id="best-price" class="price big-price">0.20</div>
<div id="worst-price" class="price big-price">15.98</div>
这是我尝试使用的代码
import requests, urllib.parse
from bs4 import BeautifulSoup, element
r = requests.get('https://herf.io/bids?search=tatuaje%20tattoo')
soup = BeautifulSoup(r.text, 'html.parser')
avgPrice = soup.find("div", {"id": "avg-price"})
lowPrice = soup.find("div", {"id": "best-price"})
highPrice = soup.find("div", {"id": "worst-price"})
print(avgPrice)
print(lowPrice)
print(highPrice)
print("Average Price: {}".format(avgPrice))
print("Low Price: {}".format(lowPrice))
print("High Price: {}".format(highPrice))
但是,它不包括div之间的价格...结果如下:
<div class="price big-price" id="avg-price"></div>
<div class="price big-price" id="best-price"></div>
<div class="price big-price" id="worst-price"></div>
Average Price: <div class="price big-price" id="avg-price"></div>
Low Price: <div class="price big-price" id="best-price"></div>
High Price: <div class="price big-price" id="worst-price"></div>
有什么想法吗?我确定我正在忽略一些小东西,但是我现在机智了。哈哈。
答案 0 :(得分:1)
当然可以,但是仅当不需要使用javascrip计算数据时。就是现在!
在此网站中,您可以使用fiddler找出javascrip用来加载数据的URL,然后可以从中获取json或其他名称。这是一个简单的示例,在我使用提琴手找出数据来自何处之后。请记住,使用提琴手证书时需要设置verify=False
。
import requests
with requests.Session() as se:
se.headers = {
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36",
"Referer": "https://herf.io/bids?search=tatuaje%20tattoo",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Accept-Encoding":"gzip, deflate, br",
}
data = [
"search=tatuaje+tattoo",
"types=",
"sites=",
]
cookies = {
"Cookie": "connect.sid=s%3ANYNh5s6LzCVWY8yE9Gra8lxj9OGHPAK_.vGiBmTXvfF4iDScBF94YOXFDmC80PQxY%2FX9FLQ23hYI"}
url = "https://herf.io/bids/search/open"
price = "https://herf.io/bids/search/stats"
req = se.post(price,data="&".join(data),cookies=cookies,verify=False)
print(req.text)
输出
{“ bottomQuarter”:4.4,“ topQuarter”:3.31,“ median”:3.8,“ mean”:4.03,“ stddev”:1.44,“ moe”:0.08,“ good”:2.59,“ great”: 1.14,“差”:5.47,“差”:6.91,“最佳”:0.2,“最差”:15.98,“计数”:1121}
答案 1 :(得分:0)
您可以使用text
属性删除文本:
print("Average Price: {}".format(avgPrice.text))
print("Low Price: {}".format(lowPrice.text))
print("High Price: {}".format(highPrice.text))
答案 2 :(得分:0)
尝试
avgPrice[0].text
对于其余部分,请执行相同操作。