Question

我正在使用Horsmenan来抓取网站，以便使用提取的数据构建一些图表。
我设法用我的代码获取每个重要部分的根元素，但我不知道如何浏览内部的每个元素。
我想要做的是使用一些子元素构建json，例如：

公司名称
company-stack（包含ul列表）

到目前为止，这是我的代码：

router.get('/', function(req, res, next) {
  //All the web scraping magic will happen here

  var url = "http://www.welcometothejungle.co/stacks?q=&hPP=30&idx=cms_companies_stacks_production&p=";

  const pages = [0,1,2,3,4];
  pages.forEach((page) => {
    const horseman = new Horseman();
    horseman
        .open(url + '' + page)
        .html('article')
        .then((text) => {
            console.log(`${text}`);
        })
        .close();
  });
  res.render('index', {title :"Done"});

});

如何浏览文本＆＃39;结果变量？

Answer 1

我设法使用另一个名为cheerio的模块解析数据。如果你有办法与骑士一起做这可能会很有趣！

i = 1
while i <= 3:
    filename = "C:\\Users\\Python\\Datei%d.txt" % i
    i += 1
    f = open(filename, "rw")
    Text(f)
    f.close()

在刮擦后用骑士浏览html

1 个答案: