硒使用页脚

时间:2019-04-23 04:36:12

标签: selenium web-scraping beautifulsoup python-3.6

我正在使用硒和漂亮的汤来尝试滚动页面上的内容。我试图使用下面的代码加载所有503条帖子,但意识到页面没有滚动和加载。页脚上有数字可以单击并加载下一页。谁能建议如何从一页到另一页单击?我在下面包含了源代码。例如,如果我只想单击第2页,是否可以使用CSS选择器来查找元素?

代码:

soup = BeautifulSoup(driver.page_source)

emptLst=[]

for i in range(int(round(503/12))):

    print(i)

    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

    time.sleep(3)

源代码:

<section class="search-results-pagination-section">
                      <artdeco-pagination class="artdeco-pagination    pv5">
<!---->
    <ul class="artdeco-pagination__pages artdeco-pagination__pages--number">
          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number active selected">
    <span>1</span>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 2" data-ember-action="" data-ember-action-252="252">
      <span>2</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 3" data-ember-action="" data-ember-action-255="255">
      <span>3</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 4" data-ember-action="" data-ember-action-258="258">
      <span>4</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 5" data-ember-action="" data-ember-action-261="261">
      <span>5</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 6" data-ember-action="" data-ember-action-264="264">
      <span>6</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 7" data-ember-action="" data-ember-action-267="267">
      <span>7</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 8" data-ember-action="" data-ember-action-270="270">
      <span>8</span>
    </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number">
  <button data-ember-action="" data-ember-action-273="273">
    <span>…</span>
  </button>
</li>

          <li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
    <button aria-label="Page 21" data-ember-action="" data-ember-action-276="276">
      <span>21</span>
    </button>
</li>

    </ul>

<!----></artdeco-pagination>


                    </section>

1 个答案:

答案 0 :(得分:1)

您可以使用attribute = value选择器来定位按钮,例如

driver.find_element_by_css_selector('[aria-label="Page 2"]').click()

使用一个URL进行测试将很有帮助,因为我不确定当前可见的21是否确实是最后一页,但是如果是这样,则可以收集所有按钮并从列表中的最后一个按钮中提取最后一个页码。然后通过构造用于单击的aria-label属性值来循环所有页面:

buttons = d.find_elements_by_css_selector('.artdeco-pagination__pages button')
pages = int(buttons[-1].text)

if pages > 1:
    for page in range(2, pages + 1):
        driver.find_element_by_css_selector('[aria-label="Page {}"]'.format(page)).click()
        #do something