使用BeautifulSoup(或者更确切地说是xpath)解析span类

时间:2014-05-06 17:58:43

标签: python parsing html-parsing beautifulsoup

我有:

try:
    page = requests.get(Scrape.site_to_scrape['git']+gitUser)
    tree = urllib.urlopen(page).read()
    soup = BS(response)
    parse_git_full_name = soup.find("span", {"class":"vcard-fullname"}).get_text()
    return parse_git_full_name

except:
    print "Syntax: python site_scrape.py -g <git user name here>"

但是,它一直落入except:

我正在尝试解析像:

这样的元素
<span class="vcard-fullname" itemprop="name">The name</span>

我正在尝试获取<span>代码

之间的值

1 个答案:

答案 0 :(得分:1)

使用xpath使用单个选择器来解决此问题。希望这有助于其他人在beautifulsoup选择器上拔头发。

try:
    page = requests.get(Scrape.site_to_scrape['git']+gitUser)
    tree = html.fromstring(page.text)

    full_name = tree.xpath('//span[@class="vcard-fullname"]/text()')

    print 'Full Name: ', full_name

except:
    print "Syntax: python site_scrape.py -g <git user name here>"