我目前正试图从Bonhams网站(https://www.bonhams.com/auctions/25281/?category=results#/!)上提供的“ Hong Kong Watches 2.0”拍卖的所有拍卖品(第1页至第33页)中获取标题。我是使用python和selenium的新手,但是我尝试使用下面的代码获取结果。这段代码为我提供了我想要的结果,但仅适用于第1页。然后,该代码不断重复第1页的结果。似乎无法点击下一页的循环。有人可以帮我解决这个问题吗?
下面您可以找到我使用的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
driver=webdriver.Chrome()
driver.get('https://www.bonhams.com/auctions/25281/?category=results#/!')
while True:
next_page_btn =driver.find_elements_by_xpath("//*[@id='lots']/div[2]/div[5]/div/a[10]/div")
if len(next_page_btn) <1:
print("no more pages left")
break
else:
titles = driver.find_elements_by_xpath("//*[@class='firstLine']")
titles = [title.text for title in titles]
print(titles)
element = WebDriverWait(driver,5).until(expected_conditions.element_to_be_clickable((By.ID,'lots')))
driver.execute_script("return arguments[0].scrollIntoView();", element)
element.click()
在下面找到我得到的输出。 Python会不断重复/加载此输出(我认为它执行了33次?)。
['Hong Kong Watches 2.0', '', 'OMEGA. A Very Fine And Rare Limited Edition
Yellow Gold Chronograph Bracelet Watch, Commemorating the Apollo 11 Space
Mission And The Successful Moon Landing in 1969', '', '', '', 'ROLEX. TWO
SETS OF SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1970s', '', 'ROLEX.
TWO SETS OF RARE SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1980s',
'', 'PATEK PHILIPPE. A SET OF THREE RARE LIMOGES PORCELAIN AND ENAMEL
DISHES', '', 'Bvlgari/MAUBOUSSIN. TWO SETS OF CUFFLINKS', '',
'BOUCHERON/MONTBLANC. TWO SETS OF CUFFLINKS', '', 'PATEK PHILIPPE. TWO
SETS OF CUFFLINKS', '', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With
8-Days Power Reserve and Alarm', '', 'Cartier & LeCoultre. A group of
three gilt brass table clocks (Alarm/Alarm Worldtime/Engraved dial)', '',
'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve',
'', 'Reuge. A Gold Plated Musical Automaton Open Face Pocket Watch with
Alarm', '', 'Imhof. An Attractive Gilt Brass Table Clock With Polychrome
Enamel Dial', '', 'Vacheron Constantin. A Large Polished Metal Perpetual
Calendar Wall Clock']
['Hong Kong Watches 2.0', '', 'OMEGA. A Very Fine And Rare Limited Edition
Yellow Gold Chronograph Bracelet Watch, Commemorating the Apollo 11 Space
Mission And The Successful Moon Landing in 1969', '', '', '', 'ROLEX. TWO
SETS OF SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1970s', '', 'ROLEX.
TWO SETS OF RARE SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1980s',
'', 'PATEK PHILIPPE. A SET OF THREE RARE LIMOGES PORCELAIN AND ENAMEL
DISHES', '', 'Bvlgari/MAUBOUSSIN. TWO SETS OF CUFFLINKS', '',
'BOUCHERON/MONTBLANC. TWO SETS OF CUFFLINKS', '', 'PATEK PHILIPPE. TWO
SETS OF CUFFLINKS', '', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With
8-Days Power Reserve and Alarm', '', 'Cartier & LeCoultre. A group of
three gilt brass table clocks (Alarm/Alarm Worldtime/Engraved dial)', '',
'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve',
'', 'Reuge. A Gold Plated Musical Automaton Open Face Pocket Watch with
Alarm', '', 'Imhof. An Attractive Gilt Brass Table Clock With Polychrome
Enamel Dial', '', 'Vacheron Constantin. A Large Polished Metal Perpetual
Calendar Wall Clock']
答案 0 :(得分:0)
不需要selenium
库来抓取数据。您还可以使用requests
和BeautifulSoup
库获取所有页面数据。
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0",
"Accept": "application/json"
}
page_num = 1
title_list = []
while True:
url = 'https://www.bonhams.com/api/v1/lots/25281/?category=results&length=12&minimal=false&page={}'.format(page_num)
print("===url===",url)
response = requests.get(url,headers=headers).json()
max_lot = response['max_lot']
last_iSaleLotNo = 0
titles = []
for lot in response['lots']:
last_iSaleLotNo = lot['lot_id_combined']
title = BeautifulSoup(lot['styled_title'], 'lxml').find("div",{'class':'firstLine'}).text.strip()
titles.append(title)
title_list.append(titles)
print("===titles===",titles)
if int(max_lot) == int(last_iSaleLotNo):
break
page_num+=1
print(title_list)
首页o / p:
['ROLEX. TWO SETS OF SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1970s', 'ROLEX. TWO SETS OF RARE SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1980s', 'PATEK PHILIPPE. A SET OF THREE RARE LIMOGES PORCELAIN AND ENAMEL DISHES', 'Bvlgari/MAUBOUSSIN. TWO SETS OF CUFFLINKS', 'BOUCHERON/MONTBLANC. TWO SETS OF CUFFLINKS', 'PATEK PHILIPPE. TWO SETS OF CUFFLINKS', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve and Alarm', 'Cartier & LeCoultre. A group of three gilt brass table clocks (Alarm/Alarm Worldtime/Engraved dial)', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve', 'Reuge. A Gold Plated Musical Automaton Open Face Pocket Watch with Alarm', 'Imhof. An Attractive Gilt Brass Table Clock With Polychrome Enamel Dial', 'Vacheron Constantin. A Large Polished Metal Perpetual Calendar Wall Clock']