Question

加载数据后，我需要从网站上抓取数据，

有一个进程在1到200之间循环运行，在HTML本身上处理到200后，我需要得到结果。

1。。可以吗？我知道可以与cheerio合作，但是在流程结束后我找不到解决如何实时捕获它的方法。

2。。当我尝试使用 axios http get请求进行请求时，如何忽略CORS设置。

我不知道如何在vue.config.js中使用 proxy 。我没有找到有关如何使用它的完整说明。

这是我的代码，为了安全起见，我当然更改了一些数据：

  <div class="hello">
    <h1>{{ msg }}</h1>
    <ul>
      <li v-for="(message, index) in messages" :key="index">
        <b>{{ messages.ip }} [{{ message.type }}]:</b>
        {{ message.blocked }}
      </li>
    </ul>
  </div>
</template>

<script>
import axios from "axios";
import cheerio from "cheerio";
export default {
  name: "ScrapIP",
  props: {
    msg: String,
    messages: Array
  },
  methods: {
    fetchUrl() {
      for (let i = 0; i < 5; i++) {
        const ip = "192.168.0." + i;
        const url = "http://xxx/yyy.org/lookup/" + ip + ".html";
        axios.get(url).then(response => {
          const $ = cheerio.load(response.data);

          setTimeout(() => {
            if ($(".global_data_cnt_DNSBLBlacklistTest").text() == 243) {
              this.messages.push({
                ip: ip,
                type: "Blacklist Test",
                blocked: $(".global_data_cnt_DNSBLBlacklistTest").text()
              });
            }
          }, 10000);
        });
      }
    }
  },
  created() {
    this.fetchUrl();
  }
};
</script>

Answer 1

我对cheerio不太熟悉，但是puppeteer可能对您尝试做的事情很有用。它在后台启动铬实例，以执行您要执行的任务，因此在循环后捕获数据会更容易。唯一的缺点是速度，它在node.js中使用。

使用Vue或JavaScript从网站实时抓取html

1 个答案: