Scrapy使用xpath抓取ul类是行不通的

时间:2018-03-01 09:49:35

标签: python xpath scrapy

我抓取这个网站 https://movies.yahoo.com.tw/movieinfo_main.html/id=7819

我在终端上使用scarpy shell 'https://movies.yahoo.com.tw/movieinfo_main.html/id=7819'

我想抓取li下的六个ul href enter image description here

当我想获取li标签时,我输入命令response.xpath('//ul[@class="trailer_list imglist slick-initialized slick-slider"]/li'),但获取一个空列表[]

我尝试输入此命令response.xpath('//div[@class="l_box_inner"]/ul/li/a/@href').extract() 这是我得到的:

In [14]: response.xpath('//div[@class="l_box_inner"]/ul/li/a/@href').extract()
Out[14]: 
[u'https://movies.yahoo.com.tw/name_main/1000',
 u'https://movies.yahoo.com.tw/name_main/2595',
 u'https://movies.yahoo.com.tw/video/%E9%81%8A%E6%88%B2%E5%A4%9C%E6%AE%BA%E5%BF%85%E6%AD%BB-%E4%B8%AD%E6%96%87%E9%A0%90%E5%91%8A-095130014.html?movie_id=7819',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189047',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189050',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189053',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189056',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189059',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189062',
 u'https://movies.yahoo.com.tw/post/169756772517/\u5091\u68ee\u8c9d\u7279\u66fc\u6372\u9032\u5931\u63a7\u904a\u6232\u591c-\u5168\u662f\u4ed6\u60f9\u7684\u798d']

但我只是想得到六个href他们的ID是189047'189050'189053'189056'189059'189062

如果我只想获得li六个href,那么正确的xpath命令是什么?

任何帮助将不胜感激。提前谢谢。

1 个答案:

答案 0 :(得分:1)

与浏览器呈现的源相比,目标ul的scrapy响应类似乎更少:

response.xpath('//ul[@class="trailer_list imglist"]/li/a/@href').extract()

输出:

[u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189047',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189050',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189053',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189056',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189059',
 u'https://movies.yahoo.com.tw/movieinfo_photos.html/id=7819?movie_photo_id=189062']