Question

我正在尝试进行网页抓取，我想以JSON格式显示数据。

我的任务是从网站中提取每个帖子并以JSON格式显示其相关数据。我的问题是我似乎无法定位row（）然后定位每个id。我可以在我的代码中输入id，但我希望程序能够通过seacrh获取id和控制台记录行中每个id的数据。示例：我想通过id获得第一篇文章的标题。

我希望我有意义。我试图从以下网站提取数据的网站： here

我的代码：

 var express = require('express');
 var path = require('path');
 var request = require('request');
 var cheerio = require('cheerio');
 var fs = require('fs');
 var app = express();
 var port = 8080;

 var url= "https://news.ycombinator.com/";

 request(url, function(err,resp,body){
 var $ = cheerio.load(body);

   var title = $('tr');

   var uri
   var author
   var points
   var comments
   var rank

   var posts = {
       postTitle : title,
       postUri : uri,
       postAuthor : author,
       postPoints : points,
       postComments : comments,
       postRank : rank
   }

   console.log(posts)

   })

   app.listen(port);
   console.log('server is listening on' + port);

Answer 1

hackernews的技巧是三个tr元素显示一行。这就是为什么rows的每个元素都继承了tr的三个后续元素的原因。在rows.map内，每个item都是一行，您可以“按行”访问属性。

let cheerio = require('cheerio')
let request = require('request');

const url = "https://news.ycombinator.com/";
request(url, function(err,resp,body){
  let $ = cheerio.load(body);

  const tr = $('.itemlist > tr');
  let rows = Array((tr.length - 2)/3); //the last two are the More button

  for (var i = 0; i < (tr.length - 2)/3; ++i){
    rows[i] = tr.slice(3*i, 3*(i+1));
  }

  res = rows.map(function(item, index) {
    return {
      postTitle: $(item).find('.storylink').text(),
      postUri: $(item).find('.storylink').attr('href'),
      postComments: $(item).find('a+ a').text(),
    }
  })

  console.log(res);

})

这给了你：

[ { postTitle: 'CockroachDB beta-20161013',
    postUri: 'https://jepsen.io/analyses/cockroachdb-beta-20161013',
    postComments: '10 comments' },
  { postTitle: 'Attacking the Windows Nvidia  Driver',
    postUri: 'https://googleprojectzero.blogspot.com/2017/02/attacking-windows-nvidia-driver.html',
    postComments: '7 comments' },
  { postTitle: 'DuckDuckGo Donates $300K to Raise the Standard of Trust Online',
    postUri: 'https://spreadprivacy.com/2017-donations-d6e4e4230b88#.kazx95v27',
    postComments: '25 comments' },
... ]

如何定位行中的第一个id

1 个答案: