无法从表中获取某些列

时间:2018-08-07 07:28:32

标签: python python-3.x selenium selenium-webdriver web-scraping

我用python与硒结合编写了一个脚本,用于从网页的表中解析某些字段。我关注的字段位于标题HomeHandicap中。我可以在标题Home中获取内容,但是无法在标题Handicap中获取内容。我怎么能得到它?

这是我到目前为止的尝试:

import time
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get("http://info.nowgoal.com/en/League/2018-2019/36.html")
time.sleep(3) #intentional delay to let the webpage load it's content
soup = BeautifulSoup(driver.page_source,"lxml")
for items in soup.select('table#Table3 tr'):
    name = items.find_all("td")[2].text
    # stat = items.find_all("td")[5].text  #this is not working
    print(name)
driver.quit()

1 个答案:

答案 0 :(得分:2)

前两行只是标题。要获取值,您需要遍历除前两行以外的所有行

for items in soup.select('table#Table3 tr')[2:]:
    name = items.find_all("td")[2].text
    stat_ft = items.find_all("td")[5].text
    stat_ht = items.find_all("td")[6].text
    print(name, stat_ft, stat_ht)