我正在尝试从此网页抓取数据
我需要从表中复制内容并将它们放在csv文件中,然后转到下一页并将这些页面的内容附加到同一个文件中。我能够刮掉桌子但是当我尝试使用selenium webdriver的点击循环点击下一个按钮时,它会转到下一页并停止。这是我的代码。
driver = webdriver.Chrome(executable_path = 'path')
url = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'
def data_from_cricinfo(url):
driver.get(url)
pgsource = str(driver.page_source)
soup = BeautifulSoup(pgsource, 'html5lib')
data = soup.find_all('div', class_ = 'engineTable')
for tr in data:
info = tr.find_all('tr')
# grab data
next_link = driver.find_element_by_class_name('PaginationLink')
next_link.click()
data_from_cricinfo(url)
是否仍然使用循环单击所有页面的next并将所有页面的内容复制到同一文件中?提前谢谢。
答案 0 :(得分:1)
您可以执行以下操作来遍历所有页面(通过Next
按钮)并解析表格中的数据:
from selenium import webdriver
from bs4 import BeautifulSoup
URL = 'http://stats.espncricinfo.com/ci/engine/stats/index.html?class=1;team=5;template=results;type=batting'
driver = webdriver.Chrome()
driver.get(URL)
while True:
soup = BeautifulSoup(driver.page_source, 'html5lib')
table = soup.find_all(class_='engineTable')[2]
for info in table.find_all('tr'):
data = [item.text for item in info.find_all("td")]
print(data)
try:
driver.find_element_by_partial_link_text('Next').click()
except:
break
driver.quit()