包含我要打印的数据的HTML div类
<div class="gs_a">LR Binford - American antiquity, 1980 - cambridge.org </div>
到目前为止,这是我的代码:
from selenium import webdriver
def Author (SearchVar):
driver = webdriver.Chrome("/Users/tutau/Downloads/chromedriver")
driver.get ("https://scholar.google.com/")
SearchBox = driver.find_element_by_id ("gs_hdr_tsi")
SearchBox.send_keys(SearchVar)
SearchBox.submit()
At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')
print (At)
Author("dog")
我打印时出现的所有内容都是
selenium.webdriver.remote.webelement.WebElement (会话= “9aa956e2bd51f510dd626f6937b01c0e”, 元素= “0.6506218589189958-1”)
不是文字 我是selenium的新手帮助表示赞赏
答案 0 :(得分:1)
<强>简介强>
首先,我建议使用更快的解析器在selenium的public class Employee {
private String cpfAccNo, empName;
private double ordinaryWages, additionalWages, cpfContrib, cdac, mbmf, sinda, ecf, lastEmpDate, age;
public Employee(){
this.cpfAccNo = "";
this.empName = "";
this.ordinaryWages = 0;
this.additionalWages = 0;
this.cpfContrib = 0;
this.cdac = 0;
this.mbmf = 0;
this.sinda = 0;
this.ecf = 0;
this.lastEmpDate = 0;
this.age = 0;
}
public Employee(String cpfAccNo, String empName, double ordinaryWages, double additionalWages, double cpfContrib,
double cdac, double mbmf, double sinda, double ecf, double lastEmpDate, double age) {
this.cpfAccNo = cpfAccNo;
this.empName = empName;
this.ordinaryWages = ordinaryWages;
this.additionalWages = additionalWages;
this.cpfContrib = cpfContrib;
this.cdac = cdac;
this.mbmf = mbmf;
this.sinda = sinda;
this.ecf = ecf;
this.lastEmpDate = lastEmpDate;
this.age = age;
}
public String getCpfAccNo() {
return cpfAccNo;
}
public void setCpfAccNo(String cpfAccNo) {
this.cpfAccNo = cpfAccNo;
}
public String getEmpName() {
return empName;
}
public void setEmpName(String empName) {
this.empName = empName;
}
public double getOrdinaryWages() {
return ordinaryWages;
}
public void setOrdinaryWages(double ordinaryWages) {
this.ordinaryWages = ordinaryWages;
}
public double getAdditionalWages() {
return additionalWages;
}
public void setAdditionalWages(double additionalWages) {
this.additionalWages = additionalWages;
}
public double getCpfContrib() {
return cpfContrib;
}
public void setCpfContrib(double cpfContrib) {
this.cpfContrib = cpfContrib;
}
public double getCdac() {
return cdac;
}
public void setCdac(double cdac) {
this.cdac = cdac;
}
public double getMbmf() {
return mbmf;
}
public void setMbmf(double mbmf) {
this.mbmf = mbmf;
}
public double getSinda() {
return sinda;
}
public void setSinda(double sinda) {
this.sinda = sinda;
}
public double getEcf() {
return ecf;
}
public void setEcf(double ecf) {
this.ecf = ecf;
}
public double getLastEmpDate() {
return lastEmpDate;
}
public void setLastEmpDate(double lastEmpDate) {
this.lastEmpDate = lastEmpDate;
}
public double getAge() {
return age;
}
public void setAge(double age) {
this.age = age;
}
}
上选择你的目标。
page_source
解决方案1
然后,您需要从Web元素中提取import lxml
import lxml.html
# put this below SearchBox.submit()
CSS_SELECTOR = '#gs_res_ccl_mid > :nth-child(1) > .gs_ri > .gs_a' # Define css
source = driver.page_source # Get all html
At_raw = lxml.html.document_fromstring(source) # Convert
At = At_raw.cssselect(CSS_SELECTOR) # Select by CSS
并对其进行正确编码。
text_content()
解决方案2
如果At = At.text_content().encode('utf-8') # Get text and encode
print At
包含多行和unicode,您也可以删除它们:
At
答案 1 :(得分:1)
好像你差不多了。也许,根据您共享的 HTML 和代码试用,您可以看到所需的输出。
执行以下代码行后:
At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')
WebElement 在引用所需的元素(列表中的单个元素)。在下一步中,当您调用print (At)
时,会打印 WebElement At ,如下所示:
selenium.webdriver.remote.webelement.WebElement (session="9aa956e2bd51f510dd626f6937b01c0e", element="0.6506218589189958-1")
现在,根据你的问题,如果你想提取文字 LR Binford - 美国古代,1980 - cambridge.org ,你必须通过元素调用其中一种方法:
text
:获取元素的文本。get_attribute(attributeName)
:获取元素的给定属性或属性。所以你需要改变代码行:
print (At)
以下任一项:
使用 text
:
print(At.text)
使用 get_attribute(attributeName)
:
print(At.get_attribute("innerHTML"))
您自己的代码经过微调:
# -*- coding: UTF-8 -*-
from selenium import webdriver
def Author (SearchVar):
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get ("https://scholar.google.com/")
SearchBox = driver.find_element_by_name("q")
SearchBox.send_keys(SearchVar)
SearchBox.submit()
At = driver.find_elements_by_css_selector ('#gs_res_ccl_mid > div:nth-child(1) > div.gs_ri > div.gs_a')
for item in At:
print(item.text)
Author("dog")
控制台输出:
…, RJ Marles, LS Pellicore, GI Giancaspro, TL Dog - Drug Safety, 2008 - Springer
答案 2 :(得分:0)
您正在打印元素。打印( At.text ),而不是 At 。