我最近遇到了一个与Scrapy玩耍的问题。
考虑情况:
放置在textarea包装器中的json数组
<div id="example">
<textarea>
[{"type":"\u8d4f\u82b1","sub":[{"sname":"\u5a7a\u6e90","sid":"440a8cc57454b0656bc5b1f1","surl":"wuyuan"},{"sname":"\u7f57\u5e73","sid":"ded1be78733fb2cc70c021f4","surl":"luoping"},{"sname":"\u6d1b\u9633","sid":"f77d6111526fb820db6efaf3","surl":"luoyang"},{"sname":"\u5174\u5316","sid":"700cac1d394cbc1a34e154c3","surl":"xinghua"},{"sname":"\u4eac\u90fd","sid":"bf34d1b18b0036d9222ef5d9","surl":"jingdu"}]},{"type":"\u8d2d\u7269","sub":[{"sname":"\u8fea\u62dc","sid":"b710d6f3c821e59aba1c29d5","surl":"dibai"},{"sname":"\u6cf0\u56fd","sid":"e524735691f256c0a53e8ffb","surl":"taiguo"},{"sname":"\u97e9\u56fd","sid":"d7e376b3690f23d1dc24bbfb","surl":"hanguo"},{"sname":"\u9999\u6e2f","sid":"d7e376b3690f23d1dc24bbfb","surl":"xianggang"},{"sname":"\u65b0\u52a0\u5761","sid":"b15d7068c2f160adbc2c83fb","surl":"xinjiapo"}]},{"type":"\u8e0f\u9752","sub":[{"sname":"\u9ec4\u5c71","sid":"6701f1153f0fd41f975775f3","surl":"huangshan"},{"sname":"\u91ce\u4e09\u5761","sid":"da679613cd1729d6be7822fd","surl":"yesanpo"},{"sname":"\u4e39\u971e\u5c71","sid":"b6ca381b2120f8ccda001fdd","surl":"danxiashan"},{"sname":"\u5341\u6e21","sid":"36eb66d6c4365b0af2c94bfe","surl":"shidu"},{"sname":"\u5e90\u5c71","sid":"9a06d41f975780992b0773fa","surl":"lushan"}]},{"type":"\u6444\u5f71","sub":[{"sname":"\u5357\u4eac","sid":"1e7451eeeb69e222608ca2f4","surl":"nanjing"},{"sname":"\u745e\u58eb","sid":"ad2cb39f09736a7351eea7fb","surl":"ruishi"},{"sname":"\u676d\u5dde","sid":"440a8cc57454b0656bc5b1f1","surl":"hangzhou"},{"sname":"\u4e91\u5357","sid":"17070a5c91ca872746461bf4","surl":"yunnan"},{"sname":"\u5e03\u62c9\u683c","sid":"1e7b51eeeb69e222608ca2fb","surl":"bulage"}]}]
</textarea>
</div>
使用xpath('//*[@id="example"]/textarea/text()')
我可以获取json数组,但是有一个问题,请参见下面的图片:
xpath
是不同的,因为一个是例如而另一个来自我的dev.so它不是什么大不了的
如您所见,在开头和结尾,还有额外的[u'\r\n
和\r\n ]
我想将它转换为json,以便我可以迭代json数组,我使用json.loads(),但发生错误并说
>>> json.loads(response.xpath('//*[@id="J-head-menu"]/li[1]/textarea/text()').extract())
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
如果我手动删除额外的东西,它会工作,所以,这是我认为,我应该删除json周围的额外的东西,但我尝试了很多方但失败了,类型response.xpath()
返回不是字符串而是列表,那么如何解决呢?我没有想法,有人吗?提前谢谢!
答案 0 :(得分:1)
在xpath normalize-space
中应该有助于删除不需要的前导和尾随空格。 xpath('normalize-space(//*[@id="example"]/textarea/text())')
应该有用。
但是,这也将用一个空白字符替换空白字符序列。另见:https://developer.mozilla.org/en-US/docs/Web/XPath/Functions/normalize-space。
答案 1 :(得分:0)
response.xpath('//*[@id="example"]/textarea/text()').extract()[0].strip()
解决了这个问题!!