为什么无法通过此CSS选择器找到此类?

时间:2019-08-21 08:38:05

标签: python xpath beautifulsoup scrapy css-selectors

我正在尝试使用xpath快捷方式或css选择器来查找页面上符合此条件的所有对象:

<span class="perWord ng-binding">$0.20</span>

我正在努力理解选择器,但是我已经尝试过:

(Pdb) selector.css('.perWord').getall()
[]
(Pdb) selector.css('.perWord')
[]
(Pdb) selector.css('perWord')
[]
(Pdb) selector.css('ng-binding')
[]
(Pdb) selector.css('perWord ng-binding')
[]
(Pdb) selector.css('.perWord_ng-binding')
[]
(Pdb) selector.css('.ng-binding').getall()
['<title ng-bind-template="100 Days In Appalachia | Who Pays Writers? " class="ng-binding">100 Days In Appalachia | Who Pays Writers? </title>', '<div ng-bind="venue.name" class="pull-left ng-binding">100 Days In Appalachia</div>', '<div class="pull-right small grayLighter ng-binding"> report<span ng-bind="GrammarHelper.pluralS(interactions.length)" class="ng-binding"></span> </div>', '<span ng-bind="GrammarHelper.pluralS(interactions.length)" class="ng-binding"></span>']

这是我正在使用的网站和代码:

driver = webdriver.Chrome()
driver.get('http://whopayswriters.com/#/publication/100-days-in-appalachia')
selector = Selector(text = driver.page_source)
pdb.set_trace()

我希望给出页面上出现的所有五个实例:

<span class="perWord ng-binding">$0.20</span>

3 个答案:

答案 0 :(得分:0)

成功使用硒:

from selenium import webdriver
import time

driver = webdriver.Chrome('chromedriver.exe')
driver.get('http://whopayswriters.com/#/publication/100-days-in-appalachia')
time.sleep(3)
elems = driver.find_elements_by_class_name("perWord")

如果您想尝试在代码中添加time.sleep(3),因为有时页面尚未加载,因此找不到元素。

答案 1 :(得分:0)

该数据是通过返回json的xhr调用动态添加的。单独使用requests就足够了。您可以计算返回的json的每个单词。可以在“网络”选项卡中找到该呼叫。如果需要链接回去,可以从json添加id。

import requests

r = requests.get('http://whopayswriters.com/reports/public?design=cf&view=interaction_venues&key=%22f6c531bac691fa7846cb0b0c4b081a08%22&reduce=false&include_docs=true').json()
per_word = ['$' + str(round(int(i['doc']['compensation']['Stipend / Honoraria / Fee'].replace('$',''))/i['doc']['pieceLength'],2)) for i in r['rows']]
print(per_word)

例如,您可以链接到特征长度:

per_word = {i['doc']['pieceLength']:'$' + str(round(int(i['doc']['compensation']['Stipend / Honoraria / Fee'].replace('$',''))/i['doc']['pieceLength'],2)) for i in r['rows']}
print(per_word)

答案 2 :(得分:0)

尝试在代码中使用WebDriverWait或使用sleep,以增加特定请求的加载时间。

此外,它是一个动态请求,因此页面源中没有任何元素。因此,草率的选择器不会在响应中找到元素。您应该使用一些处理动态请求的方法,例如selenium, splash