我正在尝试抓取通过javascript生成的表格,但我正在努力。到目前为止,我的代码是:
driver = webdriver.Chrome();
driver.get("https://af.ktnlandscapes.com/")
# get table -- first wait for table to fully load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//*[@id='list-view']/tbody/tr")))
table = driver.find_element_by_xpath("//*[@id='list-view']")
# get rows
rows = table.find_elements_by_xpath("tbody/tr")
# iterate rows and get cells
for row in rows:
# get cells
print (row.get_attribute("listing"))
我想在表格中抓取“ listing =”数字。我不确定如何提取列表编号,并且正在努力理解如何强制页面打开表中的其余行,因为这些行仅在您向下滚动表时才加载。
答案 0 :(得分:4)
尝试使用以下代码:
driver = webdriver.Chrome()
driver.get("https://af.ktnlandscapes.com/")
# get table -- first wait for table to fully load
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//*[@id='list-view']/tbody/tr")))
table = driver.find_element_by_xpath("//*[@id='list-view']")
get_number = 0
while True:
count = get_number
rows = table.find_elements_by_xpath("tbody/tr[@class='list-view-listing']")
driver.execute_script("arguments[0].scrollIntoView();", rows[-1]) # scroll to last row
get_number = len(rows)
print(get_number)
time.sleep(1)
if get_number == count:
break
输出:
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
320
339
339
答案 1 :(得分:2)
使用requests
可能更简单。如果您在Chrome / Firefox中检查页面,则在滚动列表区域时,它将发送GET请求以获取更多数据。端点为:/list-view-load.php?landscape_id=31&landscape_nid=33192®ion=All&category=All&subcategory=All&search=&custom1=&custom2=&custom3=&custom4=&custom5=&offset=20
,每个请求的偏移量增加20。
您可以通过以下方式来模仿:
import requests
from lxml import html
sess = requests.Session()
url = ('https://af.ktnlandscapes.com/sites/all/themes/landscape_tools/functions'
'/list-view-load.php?landscape_id=31&landscape_nid=33192®ion=All&'
'category=All&subcategory=All&search=&custom1=&custom2=&custom3=&'
'custom4=&custom5=&offset={offset}')
gets = []
for i in range(50):
data = sess.get(url.format(offset=20*i)).json().get('data')
if not data:
break
gets.append(data)
print(f'\rfinished request {i}', end='')
else:
print('There is more data!! Increase the range.')
listings = []
for g in gets:
h = html.fromstring(g)
listings.extend(h.xpath('tr/@listing'))
print('Number of listings:', len(listings))
# prints:
Number of listings: 339
listings
# returns
['91323', '91528', '91282', '91529', '91572', '91356', '91400', '91445',
'91373', '91375', '91488', '91283', '91294', '91324', '91423', '91325',
'91475', '91415', '91382', '91530', '91573', '91295', '91326', '91424',
...
'91568', '91592', '91613', '91569', '91593', '91594', '91570', '91352',
'91414', '91486', '91353', '91304', '91311', '91354', '91399', '91602',
'91571', '91610', '103911']