代码Im正在尝试提取等级编号。我的错误指数越界,我需要获取评级和次级评级。
from selenium import webdriver
import pandas as pd
import time
import re
init_url = 'https://www.glassdoor.co.in/Reviews/DXC-Technology-Reviews-
E1603125.htm'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(init_url)
time.sleep(5)
i=0
while(i< 11):
rate1 = driver.find_elements_by_xpath("//*[@class='rating']")
rate = driver.find_element_by_xpath("//input[@title='3.0']")[i]
print(rate.text)
i+=1
答案 0 :(得分:1)
要提取等级编号,可以使用以下任一解决方案:
xpath
:
rating = driver.find_element_by_xpath("//div[@class='ratingsSummary cf']//span[@class='bigRating strong margRtSm h2']").get_attribute("innerHTML")
css_selector
:
rating = driver.find_element_by_css_selector("div.ratingsSummary.cf span.bigRating.strong.margRtSm.h2").get_attribute("innerHTML")
答案 1 :(得分:0)
您应该改为阅读以下元素的文字:-
<span class="bigRating strong margRtSm h1">3.3</span>
如您所见,它包含您所需的评分。
此外,由于您需要不同的评级,因此在一个循环中执行此操作的正确方法是计算可用的评论数,因此您的代码将仅运行多次。
最终密码-
from selenium import webdriver
import time
import re
driver = webdriver.Chrome(executable_path=r'//path')
init_url = 'https://www.glassdoor.co.in/Reviews/bangalore-hcl-technologies-reviews-SRCH_IL.0,9_IM1091_KE10,26.htm'
driver.get(init_url)
driver.maximize_window()
time.sleep(5)
i=1
count = len(driver.find_elements_by_xpath("//span[@class='bigRating strong margRtSm h1']"))
while(i<= count):
rate = driver.find_element_by_xpath("(//span[@class='bigRating strong margRtSm h1'])[" + str(i) + "]")
print(rate.text)
i+=1
编辑- 是的,对于this之类的网址,您可以提取如下评级-
from selenium import webdriver
import time
import re
driver = webdriver.Chrome(executable_path=r'//path')
init_url = 'https://www.glassdoor.co.in/Reviews/DXC-Technology-Reviews-E1603125.htm'
driver.get(init_url)
driver.maximize_window()
time.sleep(5)
i=1
count = len(driver.find_elements_by_xpath("//span[@class='rating']/span[@class='value-title']"))
print count
while(i<= count):
rate = driver.find_element_by_xpath("(//span[@class='rating']/span[@class='value-title'])[" + str(i) + "]")
print(rate.get_attribute("title"))
i+=1
该评级存储在您的title
元素的<span>
属性中,因此我使用get_attribute("value")
进行了提取。
要提取分类(如工作/生活平衡等),请使用以下解决方案-
count = len(driver.find_elements_by_xpath("//ul[@class='undecorated']//div[@class='minor']"))
while(i<= count):
sub_rating = driver.find_element_by_xpath("(//ul[@class='undecorated']//div[@class='minor'])[" + str(i) + "]/following-sibling::span")
sub_rating_title = driver.find_element_by_xpath("(//ul[@class='undecorated']//div[@class='minor'])[" + str(i) + "]")
print(sub_rating_title.get_attribute("innerHTML") , "-" , sub_rating.get_attribute("title"))
i+=1
控制台输出-
Work/Life Balance - 2.0
Culture & Values - 2.0
Career Opportunities - 3.0
Comp & Benefits - 3.0
Senior Management - 2.0
Work/Life Balance - 5.0
Culture & Values - 3.0
Career Opportunities - 4.0
Comp & Benefits - 2.0
Senior Management - 2.0
Work/Life Balance - 3.0
Culture & Values - 3.0
Career Opportunities - 3.0
Comp & Benefits - 3.0
Senior Management - 3.0
Work/Life Balance - 5.0
Culture & Values - 5.0
Career Opportunities - 5.0
Comp & Benefits - 2.0
Senior Management - 2.0
Work/Life Balance - 3.0
Culture & Values - 3.0
Career Opportunities - 2.0
Comp & Benefits - 2.0
Senior Management - 1.0
Work/Life Balance - 3.0
Culture & Values - 3.0
Career Opportunities - 4.0
Comp & Benefits - 5.0
Senior Management - 2.0
Work/Life Balance - 3.0
Culture & Values - 4.0
Career Opportunities - 3.0
Comp & Benefits - 2.0
Senior Management - 3.0