我正在抓取其中一个像this这样的网址,它通过Ajax加载数据。使用Firefox时,它可以抓取HTML,但在使用PhantomJS时,它会返回:
<html><head></head><body></body></html>
代码如下:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import selenium.webdriver.support.ui as ui
import sys
import os
from time import sleep
driver = None
url = 'https://sports.bovada.lv/live-betting/event/2391243'
driver = webdriver.PhantomJS('/Setups/phantomjs-1.9.8-macosx/bin/phantomjs')
driver.set_window_size(1128, 768) # optional
driver.get(url)
wait = ui.WebDriverWait(driver, 3000)
sleep(40)
#wait.until(EC.staleness_of(driver.find_element_by_id("coupon")), 'visible')
html = driver.page_source
#userElement = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, "coupon")))
print(html)
更新
好的,每个URL都会发生这种情况,无论是ajax还是非Ajax