Question

首先尝试

classes = tree.xpath('//*[@data-status="active"]/text()')

print ('Classes: ', classes)

返回：

['Active', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ', '\n        ', '\n\n        ', '\n\n        ', '\n\n
  ', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ', '\n
', '\n\n        ', '\n\n        ', '\n\n      ', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ', '\n        ', '\n\n
', '\n\n        ', '\n\n      ', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ']

我应该让所有类的数据状态为active，而不是一堆新行我很确定。

或者当我将xpath扩展到实际文本时，我得到一个空数组，我相信我正确地做了第二次尝试：

classes = tree.xpath('//*[@data-status="active"]/course/title/class-name/text()')

打印出一个空数组'[]'

完整代码：

from appJar import gui
from splinter import Browser
from lxml import html
import requests
browser = Browser('chrome', headless=True)


browser.visit('www.sitelogin.com')  #Access
browser.fill('username', 'johndoe') #Login
browser.fill('password', 'pass1234') #Login
button = browser.find_by_name('commit') #Login
button.click() #Login

divs = browser.find_by_id("child-89751")
within = divs.first.find_by_name('calculated-grade')
if browser.is_text_present('Current Class Schedule'):
    print("Success")
    print(within)
page = browser.html
tree = html.fromstring(page)
classes = tree.xpath('//*[@data-status="active"]/course/title/class-name/text()')
grades = tree.xpath('//span[@class="numeric-grade"]/text()')

print ('Classes: ', classes)
print ('Grades: ', grades)

预期输出（页面上的所有活动课程。见此处：https://imgur.com/a/2C0n1）标题是大蓝字：预期：

Digital Portfolio Grade 9, English 1 H, Algebra 1H etc.

Answer 1

\n是您传递给html.fromstring(page)

的字符串的一部分

您可以通过运行来省略它们：

page = ' '.join(page.split('\n'))
html.fromstring(page)

当我尝试拉文本时出现一堆/ n

1 个答案: