我是Scrapy& amp;的新手蟒蛇。我尝试从以下网址获取评论,但结果始终为null:http://vnexpress.net/tin-tuc/oto-xe-may/toyota-camry-2016-dinh-loi-tui-khi-khong-bung-3386676.html
这是我的代码:
from scrapy.spiders import Spider
from scrapy.selector import Selector
from tutorial.items import TutorialItem
import logging
class TutorialSpider(Spider):
name = "vnexpress"
allowed_domains = ["vnexpress.net"]
start_urls = [
"http://vnexpress.net/tin-tuc/oto-xe-may/toyota-camry-2016-dinh-loi-tui-khi-khong-bung-3386676.html"
]
def parse(self, response):
sel = Selector(response)
commentList = sel.xpath('//div[@class="comment_item"]')
items = []
id = 0;
logging.log(logging.INFO, "TOTAL COMMENT : " + str(len(commentList)))
for comment in commentList:
item = TutorialItem()
id = id + 1
item['id'] = id
item['mainId'] = 0
item['user'] = comment.xpath('//span[@class="left txt_666 txt_11"]/b').extract()
item['time'] = 'N/A'
item['content'] = comment.xpath('//p[@class="full_content"]').extract()
item['like'] = comment.xpath('//span[@class="txt_666 txt_11 right block_like_web"]/a[@class="txt_666 txt_11 total_like"]').extract()
items.append(item)
return items
感谢您阅读
答案 0 :(得分:3)
看起来评论已加载到包含一些JavaScript代码的页面中。
Scrapy不会在页面上执行JavaScript,只会下载HTML页面。尝试在浏览器中禁用JavaScript打开页面,您应该看到Scrapy看到的页面。
您有一些选择: