我使用这个xpath:
.//div[contains(@itemprop, "creator")]/p[contains(@class, "glbComentarios-texto-comentario")]/text()
在chrome dev工具上完美运行:
result print of xpath on web page g1
这是我的代码:
import requests
from bs4 import BeautifulSoup
from lxml import html
page = requests.get('http://g1.globo.com/sao-paulo/noticia/camara-municipal-de-sp-aprova-concessao-do-pacaembu-a-iniciativa-privada.ghtml')
tree = html.fromstring(page.content)
comments = tree.xpath('.//div[contains(@itemprop, "creator")]/p[contains(@class, "glbComentarios-texto-comentario")]/text()')
但我只得到一个空列表。我也尝试过scrapy和BeautifulSoup,但我也没有成功。