我有一个link
最后有“ _price_asc”,它进行升序排序。当我在浏览器中点击此链接时,排序工作正常。
但是!如果我尝试使用bs4解析商品链接,这会为我提供随机价格的商品,即升序排序不起作用
我在做什么错了?
from urllib.request import urlopen
from bs4 import BeautifulSoup
link = 'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc'
total_links = ''
page = urlopen(link)
bs_page = BeautifulSoup(page.read(), features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
for g in range(10):
total_links += str(objects[g]["href"]) + '\n'
print(total_links)
答案 0 :(得分:2)
发生这种情况的原因是,如果您查看以下链接
https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc
链接以“ #p1_price_asc ”结尾,井号是各种页面标记的指示符,此处是link说明。基本上,URL中的“#”通常由javascript函数调用。
由于您正在使用以下方式下载页面:
page = urlopen(link)
这不会导致进行排序的javascript函数调用。我强烈建议主题标签上的链接,因为这样做比我做的要好得多。
现在关于如何实现所需的目标,您有两个选择:
我个人会推荐方法2,因为学习硒可能会让人有些痛苦,而且通常不值得……我认为。
答案 1 :(得分:1)
此页面使用JavaScript获取排序的数据,但是BeautifulSoup
/ urllib
无法运行JavaScript
但是在DevTools
/ Firefox
中使用Chrome
(标签:Network
,过滤器:XHR
),我发现JavaScript
读取JSON某些网址中的数据,并且HTML中包含已排序的数据-因此您可以将此网址与BeautifulSoup
一起使用以获取已排序的数据。
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json
# new url
link = 'https://steamcommunity.com/market/search/render/?query=&start=0&count=10&search_descriptions=0&sort_column=price&sort_dir=asc&appid=730&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife'
page = urlopen(link)
data = json.loads(page.read().decode())
html = data['results_html']
bs_page = BeautifulSoup(html, features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
data = []
for g in objects:
link = g["href"]
price = g.find('span', {'data-price': True}).text
data.append((price, link))
print("\n".join(f"{price} | {link}" for price, link in data))
结果:
$67.43 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Urban%20Masked%20%28Field-Tested%29
$67.70 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Field-Tested%29
$69.00 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Minimal%20Wear%29
$69.52 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Scorched%20%28Battle-Scarred%29
$69.48 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Safari%20Mesh%20%28Field-Tested%29
$70.32 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Forest%20DDPAT%20%28Battle-Scarred%29
$70.90 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Well-Worn%29
$70.52 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Forest%20DDPAT%20%28Field-Tested%29
$71.99 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Boreal%20Forest%20%28Field-Tested%29
$72.08 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Scorched%20%28Field-Tested%29
顺便说一句::这是我的第一个版本,它是从旧的url中读取并使用Python进行排序的。但是它只能对首页上的数据进行排序。为了获得更好的结果,必须阅读所有页面-这将花费大量时间。
from urllib.request import urlopen
from bs4 import BeautifulSoup
link = 'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc'
page = urlopen(link)
bs_page = BeautifulSoup(page.read(), features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
data = []
for g in objects:
link = g["href"]
price = g.find('span', {'data-price': True})['data-price']
price = int(price)
data.append((price,link))
data = sorted(data)
print("\n".join(f"${price/100} USD | {link}" for price, link in data))