当我尝试拉文本时出现一堆/ n

时间:2017-10-07 23:58:32

标签: python

首先尝试

classes = tree.xpath('//*[@data-status="active"]/text()')

print ('Classes: ', classes)

返回:

['Active', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ', '\n        ', '\n\n        ', '\n\n        ', '\n\n
  ', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ', '\n
', '\n\n        ', '\n\n        ', '\n\n      ', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ', '\n        ', '\n\n
', '\n\n        ', '\n\n      ', '\n        ', '\n\n        ', '\n\n        ', '\n\n      ']

我应该让所有类的数据状态为active,而不是一堆新行我很确定。

或者当我将xpath扩展到实际文本时,我得到一个空数组,我相信我正确地做了 第二次尝试:

classes = tree.xpath('//*[@data-status="active"]/course/title/class-name/text()')

打印出一个空数组'[]'

完整代码:

from appJar import gui
from splinter import Browser
from lxml import html
import requests
browser = Browser('chrome', headless=True)


browser.visit('www.sitelogin.com')  #Access
browser.fill('username', 'johndoe') #Login
browser.fill('password', 'pass1234') #Login
button = browser.find_by_name('commit') #Login
button.click() #Login

divs = browser.find_by_id("child-89751")
within = divs.first.find_by_name('calculated-grade')
if browser.is_text_present('Current Class Schedule'):
    print("Success")
    print(within)
page = browser.html
tree = html.fromstring(page)
classes = tree.xpath('//*[@data-status="active"]/course/title/class-name/text()')
grades = tree.xpath('//span[@class="numeric-grade"]/text()')

print ('Classes: ', classes)
print ('Grades: ', grades)

预期输出(页面上的所有活动课程。见此处:https://imgur.com/a/2C0n1)标题是大蓝字: 预期:

Digital Portfolio Grade 9, English 1 H, Algebra 1H etc.

1 个答案:

答案 0 :(得分:0)

\n是您传递给html.fromstring(page)

的字符串的一部分

您可以通过运行来省略它们:

page = ' '.join(page.split('\n'))
html.fromstring(page)