首先尝试
classes = tree.xpath('//*[@data-status="active"]/text()')
print ('Classes: ', classes)
返回:
['Active', '\n ', '\n\n ', '\n\n ', '\n\n ', '\n ', '\n\n ', '\n\n ', '\n\n
', '\n ', '\n\n ', '\n\n ', '\n\n ', '\n ', '\n\n ', '\n\n ', '\n\n ', '\n
', '\n\n ', '\n\n ', '\n\n ', '\n ', '\n\n ', '\n\n ', '\n\n ', '\n ', '\n\n
', '\n\n ', '\n\n ', '\n ', '\n\n ', '\n\n ', '\n\n ']
我应该让所有类的数据状态为active,而不是一堆新行我很确定。
或者当我将xpath扩展到实际文本时,我得到一个空数组,我相信我正确地做了 第二次尝试:
classes = tree.xpath('//*[@data-status="active"]/course/title/class-name/text()')
打印出一个空数组'[]'
完整代码:
from appJar import gui
from splinter import Browser
from lxml import html
import requests
browser = Browser('chrome', headless=True)
browser.visit('www.sitelogin.com') #Access
browser.fill('username', 'johndoe') #Login
browser.fill('password', 'pass1234') #Login
button = browser.find_by_name('commit') #Login
button.click() #Login
divs = browser.find_by_id("child-89751")
within = divs.first.find_by_name('calculated-grade')
if browser.is_text_present('Current Class Schedule'):
print("Success")
print(within)
page = browser.html
tree = html.fromstring(page)
classes = tree.xpath('//*[@data-status="active"]/course/title/class-name/text()')
grades = tree.xpath('//span[@class="numeric-grade"]/text()')
print ('Classes: ', classes)
print ('Grades: ', grades)
预期输出(页面上的所有活动课程。见此处:https://imgur.com/a/2C0n1)标题是大蓝字: 预期:
Digital Portfolio Grade 9, English 1 H, Algebra 1H etc.
答案 0 :(得分:0)
\n
是您传递给html.fromstring(page)
您可以通过运行来省略它们:
page = ' '.join(page.split('\n'))
html.fromstring(page)