我想抓取网站前 30 页中的数据,预期的输出是一个 Dataframe,但它只抓取第 1 页。
我的代码:
from selenium import webdriver
import pandas as pd
from bs4 import BeautifulSoup
import re
options = webdriver.ChromeOptions()
options.add_argument('-headless')
options.add_argument('-no-sandbox')
options.add_argument('-disable-dev-shm-usage')
url = "https://bonbanh.com/oto/page,"
data = []
for i in range(1,10):
driver.get(url + str(i))
x=driver.find_element_by_xpath("/html/body/div/div[6]/div[4]/div/div/div[2]/div[1]").text
print(x)
elements = driver.find_elements_by_css_selector(".cb1")
types = [el.text for el in elements]
elements = driver.find_elements_by_css_selector(".cb2_02")
names = [el.text for el in elements]
elements = driver.find_elements_by_css_selector(".cb3")
prices = [el.text for el in elements]
elements = driver.find_elements_by_css_selector(".cb4")
address = [el.text for el in elements]
df = pd.DataFrame({'TEN_XE':names,'LOAI_XE':types, 'GIA_XE': prices, 'DIA_CHI': address})
data.append(df)
我不知道为什么它只抓取数据页面 1。谢谢!!
答案 0 :(得分:0)
问题出在这段代码
types = driver.find_elements_by_css_selector(".cb1")
types = [el.text for el in elements]
由于您的 elements
未定义,您需要更改它
year = driver.find_elements_by_css_selector(".cb1")
types = [el.text for el in year]