Question

url = http://www.simon.com/mall/anchorage-5th-avenue-mall/stores

上面的网址列出了特定商城中的所有商店。我想要做的是从该链接中获取该商城中所有商店的列表。这是我到目前为止的代码

request(url, function(err, resp, body) {
    if (err) {
        console.log(err);
    } else {

        var $ = cheerio.load(body);

        $('h2.card-secondary-title.name.copy').each(function() {
            var text = $(this).text();
            console.log(text);
        });
    }
})

网页采用此格式

    <html>
      <head>
        <main id="simon" class>
          <section class="directory">
            <div id="root">
              ...
               <div class="directory-grid row">
                 ...
                   <h2 class="card-secondary-title name copy">5th Avenue Deli</h2>

我甚至无法从网站上抓一家商店。使用这种方法，我已经能够刮掉其他一些网站，但出于某种原因，这个网站将无法正常工作

Answer 1

你试图刮去的东西是通过ajax加载的，你不能用cheerio来检索它。

您可以直接复制ajax请求，以JSON格式检索信息。您正在查看的数据来自此请求：

https://api.simon.com/v1.2/tenant?lw=true&mallId=231

其中包含以下内容：

[
  {
    "brandId": 48,
    "name": "5th Avenue Deli",  // This is the value you want
    /** ... */
    ]
  },
  /* ... */
]

我对javascript很新，所以我不知道你的意思说我应该直接复制ajax请求。你能解释一下吗？更多细节？

复制Ajax调用的简单方法是检查chrome开发人员工具（F12）上的请求

然后转到network tab＆gt; XHR filter＆gt; locate the request＆gt; right click＆gt; copy＆gt; copy as cURL

然后从curl到你想在服务器端使用的任何lib，它很容易转换。

使用Node.js和Cheerio刮取公司名称

1 个答案: