使用Cheerio在脚本标签中获取变量值

时间:2020-01-26 22:29:52

标签: node.js web-scraping cheerio

172/5000 下午好!我正在尝试获取标记内变量“ var JS_WCACHE_CK =”的值,但是我已经测试并尝试适应一些代码,但是没有成功。

<script>
    var JS_IDIOMA = "pt";
    var JS_LINK_ROOT = "https://tabuademares.com";
    var JS_RUTA_ASSETS = "/assets/";
    var CONF_FORMATO_HORA = 1;
    window.google_analytics_uacct = "UA-8166479-17";
    var JS_URL_ACTUAL="%2Fbr%2Fespirito-santo%2Fvitoria";
    var JS_FECHA_ACTUAL="2020-01-26+19%3A00";
    var JS_CODIGO_ESTACION="br56";
    var JS_WCACHE_CK="Mjg5Ng==";
    var JS_ACTIVAR_SERVIDOR_BACKUP=1;
    var JS_LATITUD="-20.32352";
    var JS_LONGITUD="-40.29919";
    var JS_ZOOM="12";   
</script>

链接站点为:https://tabuademares.com/br/espirito-santo/vitoria

1 个答案:

答案 0 :(得分:1)

我会说Cheerio并不是您想要的。 Puppeteer也更合适,因为您需要的不仅是解析html的文件,而且还具有语言引擎,因此您可以与页面脚本进行交互,而不必做诸如eval这样的邪恶事情: / p>

import { Browser, launch, Page, Response } from "puppeteer";

export class JsVarService {
  private browser!: Browser;
  private page!: Page;

  constructor(private url: string) {}

  public async getVarValue(varName: string): Promise<string> {
    await this.getResponse();
    const results = <string>await this.page.evaluate(`window["${varName}"]`);
    await this.close();
    return results;
  }

  private async getResponse(): Promise<Response | null> {
    this.browser = await launch();
    this.page = await this.browser.newPage();
    return this.page.goto(this.url);
  }

  private async close(): Promise<void> {
    this.browser.disconnect();
  }
}

async function run(): Promise<void> {
  const url = "https://tabuademares.com/br/espirito-santo/vitoria";
  const varName = "JS_WCACHE_CK";
  const service = new JsVarService(url);
  console.log(await service.getVarValue(varName));
}

run().catch(e => { throw new Error(e) });