好吧,我不知道为什么'title_List'总是不返回任何内容。
我只是尝试更改“用户代理”,但结果是相同的。
有人可以告诉我我的代码哪里出问题了吗?
通过使用chrome xpath-helper,Xpath正确,如下面的img。
这是我的代码:
#coding=utf-8
import re
import urllib2
import urllib
from lxml import etree
def init():
url = 'https://tieba.baidu.com/f?kw=%E7%BE%8E%E5%A5%B3&ie=utf-8&pn=0'
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50"}
request = urllib2.Request(url, headers=headers)
response = urllib2.urlopen(request).read()
print(1)
print(response)
#shape response get data
get_title(response)
print(4)
#get title href
def get_title(response):
#html->xpath
html_dom = etree.HTML(response)
ts = html_dom.xpath('//div[@class="threadlist_lz clearfix"]/div/a[@class="j_th_tit"]/@href')
print(2)
print(ts)
for href in ts:
full_link='https://tieba.baidu.com'+str(href)
print(3)
print(full_link)
结果:(由于限制,我删除了一些代码!)
1
<!DOCTYPE html>
<!--STATUS OK-->
<html>
...
<div class="threadlist_lz clearfix">
<div class="threadlist_title pull_left j_th_tit
">
<i class="icon-member-top" alt="会员置顶" title="会员置顶" ></i><i class="icon-good" alt="精品" title="精品" ></i>
<a rel="noreferrer" href="/p/5006374769" title="【答疑解惑】误删误封绿色通道" target="_blank" class="j_th_tit ">【答疑解惑】误删误封绿色通道</a>
</div><div class="threadlist_author pull_right">
...
2
[]
4
答案 0 :(得分:1)
您的XPath表达式的@class属性错误。将其更改为ValidarSections(){
if(global.titulo === "Telefonia - Implementaciones"){
return [
{
title: "Milestone",
content: this.state.Milestone
}
]
}
if(global.titulo === "Telefonia - Integraciones"){
return [
{
title: "Relevamiento",
content: this.state.RelevamientoINT
},
{
title: "Instalaciones",
content: this.state.Instalaciones
},
{
title: "Integraciones",
content: this.state.Integracion
}
]
}
if(global.titulo === "Obras Civiles"){
return [
{
title: "Obra",
content: this.state.Obra
},
{
title: "Relevamiento",
content: this.state.RelevamientoOBR
}
]
}
}
render() {
const SECTIONS = this.ValidarSections()
....
}
(带有尾随空格),它将匹配。
j_th_tit
为避免这些错误,通常最好使用//div[@class="threadlist_lz clearfix"]/div/a[@class="j_th_tit "]/@href
函数,如
contains(...)
这种方法不太精确,但在大多数情况下足够。