获取某个类div的文本的XPath?

时间:2015-10-22 22:59:16

标签: python html xml xpath lxml

我试图从Google本地搜索结果页面抓取商家名称,例如this

enter image description here

鉴于以下内容:

enter image description here

......我原以为XPath //div[@class ="_rl"]//*[@class ="_rl"]就足够了,但它们都不会返回任何内容。我知道我需要使查询更加明确/准确,但究竟如何?

我使用Python和lxml,如果这是相关的。

3 个答案:

答案 0 :(得分:1)

你提到了Python,但根据你的截图,似乎你想从浏览器中获取xpath?

在Chrome开发者工具中,您可以右键单击该元素并选择"复制XPath。"

Chrome Copy XPath

答案 1 :(得分:1)

您正在捕获包含文本的元素,而不是元素中包含的文本。你需要获得返回对象的text属性,或者添加到你的xpath语句中,以便专门获取文本:

#from the object
list_of_elements = tree.xpath('//div[@class ="_rl"]')
for l in list_of_elements:
    print(l.text)

#capture the text
list_of_text = tree.xpath('//div[@class ="_rl"]/text()')
for l in list_of_text:
    print(l)

答案 2 :(得分:1)

以下是工作代码 -

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
from selenium.webdriver.common.by import By
from lxml import etree
import lxml.html
from bs4 import BeautifulSoup


driver = webdriver.Chrome()
driver.get("https://www.google.com/ncr#q=chiropractors%2BNew+York,+NY&rflfq=1&rlha=0&tbm=lcl")
WebDriverWait(driver,1000).until(EC.presence_of_all_elements_located((By.TAG_NAME,"body")))

tree = etree.fromstring(driver.page_source)



print 'Using pure python-----------'*2
d=driver.find_elements_by_xpath("//div[@class='_pl _ki']")
for i in d:
    print i.text.split("\n")[0]

print 'Using bs4-----------------'*2
soup = BeautifulSoup(driver.page_source,'html.parser')
raw = soup.find_all('div', class_='_rl')
for i in raw:
    print i.text


print 'Using lxml---------------'*2

tree = lxml.html.fromstring(driver.page_source)

e=tree.cssselect("._rl")

for i in e:
    d = i.xpath('.//text()')
    print ''.join(d)


driver.close()

打印:

Using pure python-----------Using pure python-----------
TAI Chiropractic
Body in Balance Chiropractic
Lamb Chiropractic
Esprit Wellness
Jamie H Bassel DC PC
Madison Avenue Chiropractic Center
Howard Benedikt DC
44'Th Street Chiropractic
Rockefeller Health & Medical Chiropractic
Frank J. Valente, DC, PC
Dr. Robert Shire
5th Avenue Chiropractic
Peterson Chiropractic
NYC Chiropractic Solutions
20 East Chiropractic of Midtown
GRAND CENTRAL CHIROPRACTIC WELLNESS CENTER
Park Avenue Chiropractic Center - Dr Nancy Jacobs
Murray Hill Chiropractic PC
Empire Sports & Spine
JW Chiropractic
Using bs4-----------------Using bs4-----------------
TAI Chiropractic
Body in Balance Chiropractic
Lamb Chiropractic
Esprit Wellness
Jamie H Bassel DC PC
Madison Avenue Chiropractic Center
Howard Benedikt DC
44'Th Street Chiropractic
Rockefeller Health & Medical Chiropractic
Frank J. Valente, DC, PC
Dr. Robert Shire
5th Avenue Chiropractic
Peterson Chiropractic
NYC Chiropractic Solutions
20 East Chiropractic of Midtown
GRAND CENTRAL CHIROPRACTIC WELLNESS CENTER
Park Avenue Chiropractic Center - Dr Nancy Jacobs
Murray Hill Chiropractic PC
Empire Sports & Spine
JW Chiropractic
Using lxml---------------Using lxml---------------
TAI Chiropractic
Body in Balance Chiropractic
Lamb Chiropractic
Esprit Wellness
Jamie H Bassel DC PC
Madison Avenue Chiropractic Center
Howard Benedikt DC
44'Th Street Chiropractic
Rockefeller Health & Medical Chiropractic
Frank J. Valente, DC, PC
Dr. Robert Shire
5th Avenue Chiropractic
Peterson Chiropractic
NYC Chiropractic Solutions
20 East Chiropractic of Midtown
GRAND CENTRAL CHIROPRACTIC WELLNESS CENTER
Park Avenue Chiropractic Center - Dr Nancy Jacobs
Murray Hill Chiropractic PC
Empire Sports & Spine
JW Chiropractic