循环浏览带有硒问题的页码链接

时间:2019-04-28 03:36:43

标签: python-3.x selenium beautifulsoup selenium-chromedriver

我正在用python 3.6处理硒和漂亮的汤。我试图单击页脚或页面中列出的页面。每次单击页脚中的数字,都会带我到下一页。然后,我从一个元素中抓取一些数据并将其添加到列表中。在下面的代码中,一切正常,直到我打到8,下一个跨度仅包含“…”而不是数字。您必须单击“…”,然后在页脚中添加一些数字。有关如何处理此问题的任何提示将不胜感激。

代码:

soup = BeautifulSoup(driver.page_source)

emptLst=[item['href'] for item in soup.select('a.job-card-search__link-wrapper')]

for i in range(int(round(503/14))):

    driver.find_element_by_css_selector('[aria-label="Page '+str(i+1)+'"]').click()

    LnkLst = [item['href'] for item in soup.select('a.job-card-search__link-wrapper')]

    emptLst+LnkLst


    time.sleep(3)

页面来源:

<section class="search-results-pagination-section">
                      <artdeco-pagination class="artdeco-pagination    pv5">
<!---->
    <ul class="artdeco-pagination__pages artdeco-pagination__pages--number">
          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number active selected">
    <span>1</span>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 2" data-ember-action="" data-ember-action-255="255">
      <span>2</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 3" data-ember-action="" data-ember-action-258="258">
      <span>3</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 4" data-ember-action="" data-ember-action-261="261">
      <span>4</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 5" data-ember-action="" data-ember-action-264="264">
      <span>5</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 6" data-ember-action="" data-ember-action-267="267">
      <span>6</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 7" data-ember-action="" data-ember-action-270="270">
      <span>7</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 8" data-ember-action="" data-ember-action-273="273">
      <span>8</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number">
  <button data-ember-action="" data-ember-action-276="276" data-is-animating-click="true">
    <span>…</span>
  </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 23" data-ember-action="" data-ember-action-279="279">
      <span>23</span>
    </button>
</li>

    </ul>

<!----></artdeco-pagination>


                    </section>

1 个答案:

答案 0 :(得分:0)

如果可能的话,我建议手动为新页面构建URL。许多网站会简单地更改页码网址中的其他参数。

例如,我们可以看一下城市服装店的网站。常规服装的页面被分页,首页网址看起来像(我在这里查看销售类别):

https://www.urbanoutfitters.com/sale

如果我查看销售商品第二页的网址,我会发现它们只是在网址中添加了一个额外的参数:

https://www.urbanoutfitters.com/sale?page=2

,对于除第一页之外的任何其他页面,此操作均类似。我遇到了与您正在处理的问题类似的问题,并且发现此方法更加容易且不易出错。