在Meteor中插入抓取内容(通过cheerio)时遇到问题

时间:2013-12-04 17:04:01

标签: javascript mongodb meteor cheerio

具体来说,我正在尝试(使用一个非常简单的应用程序,通过浏览探索流星书)从黑客新闻中删除帖子并将它们插入到流星集合中,将“新帖子”页面更新为20左右文章从黑客新闻头版中删除。

我的标题上有一个按钮连接到Meteor.call事件:

Template.header.events({
   'click .hnPull': function() {
       Meteor.call('getHnArticles');
   }
});

然后(成功)调用名为getHnArticles的Meteor.methods函数:

getHnArticles: function() {

    hn_result = Meteor.http.get('http://news.ycombinator.com');
    console.log(hn_result);
    $ = cheerio.load(hn_result.content);

    var result_set = [];

    $('span.comhead').each(function(i, element){ //for ever <span class='comhead'>, do the following
          var a = $(this).prev();
          var rank = a.parent().parent().text();
          var title = a.text();
          var url = a.attr('href');
          var subtext = a.parent().parent().next().children('.subtext').children();
          var points = $(subtext).eq(0).text();
          var username = $(subtext).eq(1).text();
          var comments = $(subtext).eq(2).text();
          //parsed metadata object
        var metadata = {
            rank: parseInt(rank),
            title: title,
            url: url,
            points: parseInt(points),
            username: username,
            comments: parseInt(comments)
        };
        result_set.push(metadata);
    });
    console.log(result_set);

    for (var i = 0; i<20; i++) {

        var hn_post = result_set[i];

        var postAttributes = {
            url: hn_post.url,
            title: hn_post.title,
            message: 'Scraped automatically from Hacker News'
        };

        var user = Meteor.user(),
        postWithSameLink = Posts.findOne({url: postAttributes.url});

        // ensure the user is logged in

        if (!user)
            throw new Meteor.Error(401, "You need to login to post new stories");

        // Make sure the post has a title.  it can't be blank
        if (!postAttributes.title)
            throw new Meteor.Error(422, 'Please fill in a headline');

        // Make sure this isn't a duplicate post or repost
        if (postAttributes.url && postWithSameLink) {
            throw new Meteor.Error(302, 
                'This link has already been posted', 
                postWithSameLink._id);
        }

        // pick out the whitelisted keys
        // This keeps a nefarious client from monkeying around with our db
        var post = _.extend(_.pick(postAttributes, 'url', 'title', 'message'), {
            userId: user._id,
            author: user.username,
            submitted: new Date().getTime(),
            commentsCount: 0, 
            upvoters: [],
            votes: 0
        });

        Posts.insert(post);

    }


},

最终结果是黑客新闻中排名靠前的帖子得到了很好的插入。但后来没有其他人。

我将名为result_set的数组发送到console.log,然后输出首页:

I202504-11:50:57.551(-5)? [ { rank: 1,
I202504-11:50:57.551(-5)?     title: 'Is iOS7 jailbroken yet?',
I202504-11:50:57.551(-5)?     url: 'https://isios7jailbrokenyet.com/',
I202504-11:50:57.551(-5)?     points: 37,
I202504-11:50:57.552(-5)?     username: 'sethbannon',
I202504-11:50:57.552(-5)?     comments: 12 },
I202504-11:50:57.552(-5)?   { rank: 2,
I202504-11:50:57.552(-5)?     title: 'Valve joins the Linux Foundation',
I202504-11:50:57.552(-5)?     url: 'http://thenextweb.com/insider/2013/12/04/valve-joins-    linux-foundation-prepares-linux-powered-steam-os-steam-machines/',
I202504-11:50:57.553(-5)?     points: 276,
I202504-11:50:57.553(-5)?     username: 'kwestro',
I202504-11:50:57.554(-5)?     comments: 117 },
I202504-11:50:57.554(-5)?   { rank: 3,
I202504-11:50:57.555(-5)?     title: 'Google Acquires Seven Robot Companies, Wants Big Role in Robotics',
I202504-11:50:57.555(-5)?     url:  'http://spectrum.ieee.org/automaton/robotics/industrial-robots/google-acquisition-seven-  robotics-companies#.Up9CGN-hd98.hackernews',
I202504-11:50:57.555(-5)?     points: 71,
I202504-11:50:57.555(-5)?     username: 'eguizzo',
I202504-11:50:57.555(-5)?     comments: 29 },
I202504-11:50:57.556(-5)?   { rank: 4,
I202504-11:50:57.556(-5)?     title: 'Evading Airport Security',
I202504-11:50:57.556(-5)?     url: 'https://www.schneier.com/blog/archives/2013/12/evading_airport.html',
I202504-11:50:57.556(-5)?     points: 87,

等等。我得到一个很好的大阵列。

知道这里出了什么问题?当代码循环播放Meteor集合时,我是否过快地插入记录,还是Mongo插入问题?

谢谢!我对Meteor很新,我喜欢它。但我仍然试图在Meteor中使用异步节点内容。

编辑:我忘了添加,当我查询MongoDB实例时,它显示只插入了顶部链接。

1 个答案:

答案 0 :(得分:1)

卫生署!!我弄清楚出了什么问题!我需要使用continue语句,而不是在我执行此代码时发出错误:

if (postAttributes.url && postWithSameLink) {
        throw new Meteor.Error(302, 
            'This link has already been posted', 
            postWithSameLink._id);
    }