使用Requests和lxml,获取表中行的href值

时间:2016-11-05 17:48:34

标签: python python-3.x lxml

Python 3

我很难遍历表格的各行。

如何通过teamName,teamState,teamLink xpaths的表体中的行数迭代tr[1]组件?

import lxml.html
from lxml.etree import XPath
url = "http://www.maxpreps.com/rankings/basketball-winter-15-16/7/national.htm"

rows_xpath = XPath('//*[@id="rankings"]/tbody)
teamName_xpath = XPath('//*[@id="rankings"]/tbody/tr[1]/th/a/text()')
teamState_xpath = XPath('//*[@id="rankings"]/tbody/tr[1]/td[2]/text()')
teamLink_xpath = XPath('//*[@id="rankings"]/tbody/tr[1]/th/a/@href')

html = lxml.html.parse(url)

for row in rows_xpath(html):
    teamName = teamName_xpath(row)
    teamState = teamState_xpath(row)
    teamLink = teamLink_xpath(row)
    print (teamName, teamLink)

我还尝试了以下内容:

from lxml import html
import requests

siteItem = ['http://www.maxpreps.com/rankings/basketball-winter-15-16/7/national.htm'
            ]

def linkScrape():
    page = requests.get(target)
    tree = html.fromstring(page.content)

#Get team link
    for link in tree.xpath('//*[@id="rankings"]/tbody/tr[1]/th/a/@href'):
        print (link)
#Get team name        
    for name in tree.xpath('//*[@id="rankings"]/tbody/tr[1]/th/a/text()'):
        print (name)
#Get team state        
    for state in tree.xpath('//*[@id="rankings"]/tbody/tr[1]/td[2]/text()'):
        print (state)

for target in siteItem:
    linkScrape()

感谢您寻找:D

1 个答案:

答案 0 :(得分:0)

如果我理解你要问的内容,你想迭代ranking表中的行。所以,从这些行的循环开始:

import lxml.html
doc = lxml.html.parse('http://www.maxpreps.com/rankings/basketball-winter-15-16/7/national.htm')

for row in doc.xpath('//table[@id="rankings"]/tbody/tr'):

这将迭代该文档中的每一行。现在,对于每一行,提取所需的数据:

    team_link = row.xpath('th/a/@href')[0]
    team_name = row.xpath('th/a/text()')[0]
    team_state = row.xpath('td[contains(@class, "state")]/text()')[0]
    print(team_state, team_name, team_link)

我的系统中的哪个产生输出:

CA Manteca /high-schools/manteca-buffaloes-(manteca,ca)/basketball-winter-15-16/rankings.htm
MD Mount St. Joseph (Baltimore) /high-schools/mount-st-joseph-gaels-(baltimore,md)/basketball-winter-15-16/rankings.htm
TX Brandeis (San Antonio) /high-schools/brandeis-broncos-(san-antonio,tx)/basketball-winter-15-16/rankings.htm