Question

我有兴趣从全球和邮件文章（http://www.theglobeandmail.com/）中抓取读者评论主题。例如，给定以下注释页面，我希望输出如下所示：

http://www.theglobeandmail.com/opinion/it-doesnt-matter-who-won-the-debate-america-has-already-lost/article32314064/comments/

输出

注释1 omigosh 2天前我简直不敢相信......

Comment1.1 sirencall 2天前在一个篮子里......

Comment1.2 Atoz先生2天前我完全同意。我有同样的......

。。。

注释2 大卫在Peachland 2天前哇靠！我花了40分钟的生命剩下的时间...... 。。

我正在使用Python 3.我探索了以下python库：scrapy，urllib，newspaper，BeautifulSoup。问题是我从网址获得的html页面不包含评论文本。下面我解释我如何使用scrapy。

通过scrapy，我创建了一个名为ScrapeNews的项目

scrapy startproject ScrapeNews

然后我在蜘蛛中写了下面的代码。

import scrapy

class NewsSpider(scrapy.Spider):
    name = 'news'
    start_urls = {
                    'http://www.theglobeandmail.com/opinion/it-doesnt-matter-who-won-the-debate-america-has-already-lost/article32314064/',
                    'http://www.theglobeandmail.com/opinion/it-doesnt-matter-who-won-the-debate-america-has-already-lost/article32314064/comments/'
    }

    def parse(self, response):
        '''
        :param response:
        :return:
        '''
        page = response.url.split("/")[-2]
        filename = 'gnm-%s.html' %page
        with open(filename, 'wb') as f:
            f.write(response.body)

然后我运行了以下内容，创建了gnm-comments.html。

scrapy crawl news

我根本没有在这个html中看到评论文字或标签。当我检查网页上的评论时，我看到评论中有标记：

<p class="comment">

但它没有出现在使用scrapy提取的html中。我怀疑我没有从网上抓取正确的东西。

任何想法可能会出错？我将非常感谢相关的解决方案，解释或指示。

从全球和邮件文章中抓取读者评论帖子

0 个答案: