Question

这是我第一次尝试将python与selenium和bs4一起使用。我正在尝试从此website

抓取数据

首先，我从cantone下拉菜单中选择GE，单击复选框“ Conffermo”和按钮“ Ricerca”。然后我可以看到数据。我必须单击每个箭头以展开数据并从每个人那里刮取（这是一个循环，不是吗）。然后在下一页上执行相同操作（通过单击页面底部的“ Affiggere le seguenti entrate”）

我想对数据使用相对xpath，因为并非所有人都拥有所有数据（我想在数据丢失时在excel中放置一个空单元格）

到目前为止，这是我的代码：

  import urllib2
  from bs4 import BeautifulSoup
  from selenium import webdriver
  from selenium.webdriver.common.keys import Keys
  browser = webdriver.Firefox()
  URL = 'http://www.asca.ch/Partners.aspx?lang=it'
  time.sleep(10)
  page = urllib2.urlopen(quote_page) # query the website and return the html to the variable ‘page’
  soup = BeautifulSoup(page, ‘html.parser’) 
  inputElementCantone = driver.find_element_by_xpath(//*[@id="ctl00_MainContent_ddl_cantons_Input"]).click()
  browser.find_element_by_xpath(/html/body/form/div[1]/div/div/ul/li[9]).click()
  browser.find_element_by_xpath(//INPUT[@id='MainContent__chkDisclaimer']).click()
  driver.find_element_by_xpath(//INPUT[@id='MainContent_btn_submit']).click() 
  arrow = browser.find_element_by_class_name("footable-toggle")

我被困在这之后。我要抓取的数据（在excel列中）是：学科，Cognome，Cellulare和电子邮件。

感谢您的帮助。

如何在span（箭头）内循环抓取数据并将其循环到连续页面中？

1 个答案: