我正在尝试使用xpath快捷方式或css选择器来查找页面上符合此条件的所有对象:
<span class="perWord ng-binding">$0.20</span>
我正在努力理解选择器,但是我已经尝试过:
(Pdb) selector.css('.perWord').getall()
[]
(Pdb) selector.css('.perWord')
[]
(Pdb) selector.css('perWord')
[]
(Pdb) selector.css('ng-binding')
[]
(Pdb) selector.css('perWord ng-binding')
[]
(Pdb) selector.css('.perWord_ng-binding')
[]
(Pdb) selector.css('.ng-binding').getall()
['<title ng-bind-template="100 Days In Appalachia | Who Pays Writers? " class="ng-binding">100 Days In Appalachia | Who Pays Writers? </title>', '<div ng-bind="venue.name" class="pull-left ng-binding">100 Days In Appalachia</div>', '<div class="pull-right small grayLighter ng-binding"> report<span ng-bind="GrammarHelper.pluralS(interactions.length)" class="ng-binding"></span> </div>', '<span ng-bind="GrammarHelper.pluralS(interactions.length)" class="ng-binding"></span>']
这是我正在使用的网站和代码:
driver = webdriver.Chrome()
driver.get('http://whopayswriters.com/#/publication/100-days-in-appalachia')
selector = Selector(text = driver.page_source)
pdb.set_trace()
我希望给出页面上出现的所有五个实例:
<span class="perWord ng-binding">$0.20</span>
答案 0 :(得分:0)
成功使用硒:
from selenium import webdriver
import time
driver = webdriver.Chrome('chromedriver.exe')
driver.get('http://whopayswriters.com/#/publication/100-days-in-appalachia')
time.sleep(3)
elems = driver.find_elements_by_class_name("perWord")
如果您想尝试在代码中添加time.sleep(3)
,因为有时页面尚未加载,因此找不到元素。
答案 1 :(得分:0)
该数据是通过返回json的xhr调用动态添加的。单独使用requests
就足够了。您可以计算返回的json的每个单词。可以在“网络”选项卡中找到该呼叫。如果需要链接回去,可以从json添加id。
import requests
r = requests.get('http://whopayswriters.com/reports/public?design=cf&view=interaction_venues&key=%22f6c531bac691fa7846cb0b0c4b081a08%22&reduce=false&include_docs=true').json()
per_word = ['$' + str(round(int(i['doc']['compensation']['Stipend / Honoraria / Fee'].replace('$',''))/i['doc']['pieceLength'],2)) for i in r['rows']]
print(per_word)
例如,您可以链接到特征长度:
per_word = {i['doc']['pieceLength']:'$' + str(round(int(i['doc']['compensation']['Stipend / Honoraria / Fee'].replace('$',''))/i['doc']['pieceLength'],2)) for i in r['rows']}
print(per_word)
答案 2 :(得分:0)
尝试在代码中使用WebDriverWait
或使用sleep
,以增加特定请求的加载时间。
此外,它是一个动态请求,因此页面源中没有任何元素。因此,草率的选择器不会在响应中找到元素。您应该使用一些处理动态请求的方法,例如selenium, splash
等