如何解析此页面中的表?

时间:2019-06-06 23:22:54

标签: python-3.x selenium web-scraping beautifulsoup

我想用

解析表
  

id = standings-16548-网格

     

class =带有中心列悬停的网格

。不幸的是,当我尝试它时,输出显示出tr完全空白。由于我是该语言的新手,所以我想知道我是否缺少某些内容。

然后,我还将从表单“表单”中抓取数据,不仅从表单“ stands”中抓取数据,而且我正在尝试一次。

下面您可以找到我的代码。

我已经尝试过使用硒打开Firefox的网页。然后,我尝试按下显示的按钮,以便您打开页面以继续使用该网站。最终,我使用BeautfulSoup尝试解析指定表ID的表。

'Python3.7'
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

driver = webdriver.Firefox(executable_path='/Applications/Python3.7/geckodriver')
driver.get('https://www.whoscored.com/Regions/108/Tournaments/5/Italy-Serie-A')
driver.implicitly_wait(20)
myDynamicElement = driver.find_element(By.XPATH, "/html/body/div[9]/div[1]/div/div/div[3]/button").click()

source = driver.execute_script("return document.documentElement.outerHTML")

soup = BeautifulSoup(source, 'lxml')

driver.quit()

table = soup.find('table', {"id":"standings-16548-grid"})
table_rows = table.find_all('tr')
for tr in table_rows:
    td = tr.find_all('tr')
    row = [i.text for i in td]
    print(row)

此代码的输出是:

Traceback (most recent call last):
  File "/Users/Gina/PycharmProjects/Prova1/DriverProva/SeleniumScrape.py", line 12, in <module>

    myDynamicElement = driver.find_element(By.XPATH, "/html/body/div[9]/div[1]/div/div/div[3]/button").click()

  File "/Users/Gina/PycharmProjects/Prova1/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
    self._execute(Command.CLICK_ELEMENT)

  File "/Users/Gina/PycharmProjects/Prova1/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
    return self._parent.execute(command, params)

  File "/Users/Gina/PycharmProjects/Prova1/venv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)

  File "/Users/Gina/PycharmProjects/Prova1/venv/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
  

selenium.common.exceptions.ElementNotInteractableException:消息:   元素无法滚动到   查看

以退出代码1完成的过程

1 个答案:

答案 0 :(得分:1)

尝试以下代码,它将返回预期的输出。

  

selenium.common.exceptions.ElementNotInteractableException:消息:元素无法滚动到视图中

为避免此错误,请使用Java脚本执行程序单击元素。我也更改了元素xpath。

  

driver.execute_script(“ arguments [0] .click();”,element)


from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import time

driver = webdriver.Firefox(executable_path='/Applications/Python3.7/geckodriver')
driver.get('https://www.whoscored.com/Regions/108/Tournaments/5/Italy-Serie-A')
element=WebDriverWait(driver,20).until(ec.element_to_be_clickable((By.XPATH,"//button[contains(.,'Continue Using Site')]")))
driver.execute_script("arguments[0].click();",element)
time.sleep(3)
source=driver.page_source
soup = BeautifulSoup(source, 'lxml')
driver.quit()

table = soup.find('table', {"id":"standings-16548-grid"})
table_rows = table.find_all('tr')

for tr in table_rows[5:len(table_rows)]:
   row = [i.text for i in tr.find_all('td')]
   print(row)

输出

['1', 'Juventus', '38', '28', '6', '4', '70', '30', '+40', '90', 'wddldl']
['2', 'Napoli', '38', '24', '7', '7', '74', '36', '+38', '79', 'lwwwwl']
['3', 'Atalanta', '38', '20', '9', '9', '77', '46', '+31', '69', 'wwwwdw']
['4', 'Inter', '38', '20', '9', '9', '57', '33', '+24', '69', 'dddwlw']
['5', 'AC Milan', '38', '19', '11', '8', '55', '36', '+19', '68', 'dlwwww']
['6', 'Roma', '38', '18', '12', '8', '66', '48', '+18', '66', 'dwdwdw']
['7', 'Torino', '38', '16', '15', '7', '52', '37', '+15', '63', 'wwdwlw']
['8', 'Lazio', '38', '17', '8', '13', '56', '46', '+10', '59', 'lwlwdl']
['9', 'Sampdoria', '38', '15', '8', '15', '60', '51', '+9', '53', 'lldldw']
['10', 'Bologna', '38', '11', '11', '16', '48', '56', '-8', '44', 'wwlwdw']
['11', 'Sassuolo', '38', '9', '16', '13', '53', '60', '-7', '43', 'dwdldl']
['12', 'Udinese', '38', '11', '10', '17', '39', '53', '-14', '43', 'dldwww']
['13', 'SPAL 2013', '38', '11', '9', '18', '44', '56', '-12', '42', 'wdwlll']
['14', 'Parma Calcio 1913', '38', '10', '11', '17', '41', '61', '-20', '41', 'dddlwl']
['15', 'Cagliari', '38', '10', '11', '17', '36', '54', '-18', '41', 'wllldl']
['16', 'Fiorentina', '38', '8', '17', '13', '47', '45', '+2', '41', 'llllld']
['17', 'Genoa', '38', '8', '14', '16', '39', '57', '-18', '38', 'lddldd']
['18', 'Empoli', '38', '10', '8', '20', '51', '70', '-19', '38', 'llwwwl']
['19', 'Frosinone', '38', '5', '10', '23', '29', '69', '-40', '25', 'lldlld']
['20', 'Chievo', '38', '2', '14', '22', '25', '75', '-50', '17', 'wdlldd']