确实是网站抓取分页问题

时间:2021-05-19 15:27:18

标签: excel vba web-scraping screen-scraping

我确实在抓取网页时卡在页面导航上。我的页面导航一直工作正常,但它在一周前停止了,我似乎无法修复它。它只是循环从第 1 页到第 2 页,因此在 6 页搜索中,它将访问第 1 + 2 页,每次访问 3 次。

我尝试了几种方法,都只是从第 1 页到第 2 页,然后再返回到第 1 页。有人可以建议正确的类或 querySelector 是什么吗?

Indeed

我正在使用这个

Do
    If pageNumber >=6 Then Exit Do
        On Error Resume Next
        Set nextPageElement = HTML.getElementsByClassName("np")(0) ' CLICK TO NEXT PAGE

    If nextPageElement Is Nothing Then Exit Do
        Application.Wait (Now + TimeValue("0:00:04"))
        nextPageElement.Click 'next web page
   
   Do While objIE.Busy = True Or objIE.readyState <> 4
    Loop

我已经尝试过这个,但似乎无法解决

Do
  If pageNumber >=6  Then Exit Do
        Set nextPageElement = HTML.getElementsByClassName("pagination-list")(0)
       'Set nextPageElement = HTML.getElementsByClassName("pagination")(0)
  If Not nextPageElement Is Nothing Then
         nextPageElement.document.querySelector(".np").Click
        'nextPageElement.document.querySelector(".pn").Click
        'nextPageElement.document.querySelector("span.pn").Click
        'nextPageElement.document.querySelector("span.np").Click
          Application.Wait (Now + TimeValue("0:00:04"))
        Else:
            Exit Do
        End If

    Do While objIE.Busy = True Or objIE.readyState <> 4
    Loop

<nav role="navigation" aria-label="pagination">
  <div class="pagination" onmousedown="pclk(event);">
    <ul class="pagination-list">
      <li><b aria-current="true" aria-label="1" tabindex="0">1</b></li>
      <li>
        <a href="/jobs?q=manager&amp;l=london&amp;start=10" aria-label="2" data-pp="gQAPAAAAAAAAAAAAAAABpUM3WgA3AQAHXaNdLSF435uhaW440sBEPpyo6wu9yyY2dv2zPlwOkia1_Ad0YcnE3oC5V5SAQfMv9shHtgAA" onmousedown="addPPUrlParam &amp;&amp; addPPUrlParam(this);" rel="nofollow"><span class="pn">2</span></a>
      </li>
      <li>
        <a href="/jobs?q=manager&amp;l=london&amp;start=20" aria-label="3" data-pp="gQAeAAAAAAAAAAAAAAABpUM3WgBeAQEBCG4JQk5YFZSUl7gsr035q39vtzDHJitYqr4vM2MFHgSufSgx4aWdnXw91UygifbITTf_9Xd-zeBWg6A2eTj9AkpYnhX-rcAjN1nrUkthXyccybLf4M72Myp_oAAA" onmousedown="addPPUrlParam &amp;&amp; addPPUrlParam(this);"
          rel="nofollow"><span class="pn">3</span></a></li>
      <li>
        <a href="/jobs?q=manager&amp;l=london&amp;start=30" aria-label="4" data-pp="gQAtAAAAAAAAAAAAAAABpUM3WgCCAQMBCBIHEAgKK3TbOCxrwjMPxwTV6_j9fNaHMo3xNxBWr7N7iCbTP36N-nxYL04iAaq5_diz3DVQ63zhssHzded33JzRuIfyx9aiyFO2ElmBhWcblWNGlmhwv7P7-3rovVi7CVDOHjCKDCVg45A442qbMJ5Wo9JtMKsCponthuH3GQAA"
          onmousedown="addPPUrlParam &amp;&amp; addPPUrlParam(this);" rel="nofollow"><span class="pn">4</span></a></li>
      <li>
        <a href="/jobs?q=manager&amp;l=london&amp;start=40" aria-label="5" data-pp="gQA8AAAAAAAAAAAAAAABpUM3WgChAQIBCBISRVBBqQZzCe6zv5ptH6EOR3ZxgRqrJvxsJggbwhKYOoxC_WmDgzvp-t1_I5-ajTgabdUP917NrwmP_ZTSJKw7Qsu2h2XODIZnEGpR4EUkRNB8BJX7y01xkWMfbFMYqgQQyGS_4mI9NrbWaont2mhvbhw4_6obi9V3Dq16okhKL7ATYpwifpNN_iYSoSsoyTM2DFqJO76sH6r8Cg4AAA"
          onmousedown="addPPUrlParam &amp;&amp; addPPUrlParam(this);" rel="nofollow"><span class="pn">5</span></a></li>
      <li>
        <a href="/jobs?q=manager&amp;l=london&amp;start=10&amp;pp=gQAPAAAAAAAAAAAAAAABpUM3WgA3AQAHXaNdLSF435uhaW440sBEPpyo6wu9yyY2dv2zPlwOkia1_Ad0YcnE3oC5V5SAQfMv9shHtgAA" aria-label="Next" data-pp="gQAPAAAAAAAAAAAAAAABpUM3WgA3AQAHXaNdLSF435uhaW440sBEPpyo6wu9yyY2dv2zPlwOkia1_Ad0YcnE3oC5V5SAQfMv9shHtgAA"
          onmousedown="addPPUrlParam &amp;&amp; addPPUrlParam(this);" rel="nofollow">
          <span class="pn">
          <span class="np">
          <svg width="24" height="24" fill="none"><path d="M10 6L8.59 7.41 13.17 12l-4.58 4.59L10 18l6-6-6-6z" fill="#2D2D2D"></path>
          </svg></span>
          </span>
        </a>
      </li>
    </ul>
  </div>
</nav>

一如既往地提前致谢

############### 5 月 20 日星期四更新 ####################

该网站名为Indeed 这是英国版,我没有发布此链接,因为它只有求职页面,没有页面分页。此链接 New Link 是找到搜索条件的时间,并且该页面上有页面分页,因此我将其发布在我的第一篇文章中。

感谢观看

################# 更新 22/5/20201 ###################### ####

也在此处发布Mr Excel

0 个答案:

没有答案