我试图抓住网址' http://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1' (仅供参考),但我似乎无法弄清楚如何进入下一页。我当前的代码如下,但它只是重复遍历第一页而不是转到下一页。
import urllib2
from bs4 import BeautifulSoup
page_num = 1
while True:
url = 'http://steamcommunity.com/market/search? q=&category_730_ItemSet%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p' + str(page_num)
open_url = urllib2.urlopen(url).read()
market_page = BeautifulSoup(read_url)
for i in market_page('div', {'class' : 'market_listing_row market_recent_listing_row market_listing_searchresult'}):
item_name = i.find_all('span', {'class' : 'market_listing_item_name'})[0].get_text()
price = i.find_all('span')[1].get_text()
page_num += 1
print item_name + ' costs ' + price
编辑: 此外,我尝试抓取的页面的问题是指向下一页的链接没有任何href,所以我使用循环尝试转到不同的URL,但它只是抓取第一个URL反复。
答案 0 :(得分:1)
import urllib2
from bs4 import BeautifulSoup
pages = 90
for page in range(pages):
url = 'http://steamcommunity.com/market/search? q=&category_730_ItemSet%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p' + str(page)
open_url = urllib2.urlopen(url).read()
market_page = BeautifulSoup(read_url)
for i in market_page('div', {'class' : 'market_listing_row market_recent_listing_row market_listing_searchresult'}):
item_name = i.find_all('span', {'class' : 'market_listing_item_name'})[0].get_text()
price = i.find_all('span')[1].get_text()
print item_name + ' costs ' + price