我使用selenium从angel.co获取数据,但仍然没有从网站获取数据
from scrapy import Request,Spider
import urllib
from selenium import webdriver
class AngelSpider(Spider):
name = "angel"
allowed_domains = ["angel.co"]
AJAXCRAWL_ENABLED = True
start_urls = (
"https://angel.co/companies?locations[]=India",
)
def __init__(self):
self.path ='/usr/lib/chromium-browser/chromedriver'
self.driver = webdriver.Chrome(self.path)
def parse(self,response):
self.driver.get(response.url)
self.driver.implicitly_wait(50)
while True:
next = self.driver.find_element_by_css_selector("div.more")
try:
next.click()
self.driver.implicitly_wait(10)
divs = self.driver.find_element_by_xpath("//div[@class= 'results']")
for div in divs:
name =divs.find_element_by_css_selector("div.name")
print name.text
except:
break
答案 0 :(得分:0)
您没有看到任何打印的原因是您正在使用 bare except子句,并且基本上默默地忽略所有引发的异常。
问题在于您在页面上找到元素的方式,在此行中,您正在使用find_element_by_xpath()
方法找到单个div元素:
divs = self.driver.find_element_by_xpath("//div[@class= 'results']")
divs
现在是一个 WebElement
实例,它不可迭代,迭代它会在下一行失败:
for div in divs:
相反,你要做的就是这样:
results = self.driver.find_elements_by_css_selector(".results > div")
for result in results:
name = result.find_element_by_css_selector(".name")
print(name.text)