我正在用python 3.6处理硒和漂亮的汤。我试图单击页脚或页面中列出的页面。每次单击页脚中的数字,都会带我到下一页。然后,我从一个元素中抓取一些数据并将其添加到列表中。在下面的代码中,一切正常,直到我打到8,下一个跨度仅包含“…”而不是数字。您必须单击“…”,然后在页脚中添加一些数字。有关如何处理此问题的任何提示将不胜感激。
代码:
soup = BeautifulSoup(driver.page_source)
emptLst=[item['href'] for item in soup.select('a.job-card-search__link-wrapper')]
for i in range(int(round(503/14))):
driver.find_element_by_css_selector('[aria-label="Page '+str(i+1)+'"]').click()
LnkLst = [item['href'] for item in soup.select('a.job-card-search__link-wrapper')]
emptLst+LnkLst
time.sleep(3)
页面来源:
<section class="search-results-pagination-section">
<artdeco-pagination class="artdeco-pagination pv5">
<!---->
<ul class="artdeco-pagination__pages artdeco-pagination__pages--number">
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number active selected">
<span>1</span>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
<button aria-label="Page 2" data-ember-action="" data-ember-action-255="255">
<span>2</span>
</button>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
<button aria-label="Page 3" data-ember-action="" data-ember-action-258="258">
<span>3</span>
</button>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
<button aria-label="Page 4" data-ember-action="" data-ember-action-261="261">
<span>4</span>
</button>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
<button aria-label="Page 5" data-ember-action="" data-ember-action-264="264">
<span>5</span>
</button>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
<button aria-label="Page 6" data-ember-action="" data-ember-action-267="267">
<span>6</span>
</button>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
<button aria-label="Page 7" data-ember-action="" data-ember-action-270="270">
<span>7</span>
</button>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
<button aria-label="Page 8" data-ember-action="" data-ember-action-273="273">
<span>8</span>
</button>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number">
<button data-ember-action="" data-ember-action-276="276" data-is-animating-click="true">
<span>…</span>
</button>
</li>
<li class="artdeco-pagination__indicator artdeco-pagination__indicator--number ">
<button aria-label="Page 23" data-ember-action="" data-ember-action-279="279">
<span>23</span>
</button>
</li>
</ul>
<!----></artdeco-pagination>
</section>
答案 0 :(得分:0)
如果可能的话,我建议手动为新页面构建URL。许多网站会简单地更改页码网址中的其他参数。
例如,我们可以看一下城市服装店的网站。常规服装的页面被分页,首页网址看起来像(我在这里查看销售类别):
https://www.urbanoutfitters.com/sale
如果我查看销售商品第二页的网址,我会发现它们只是在网址中添加了一个额外的参数:
https://www.urbanoutfitters.com/sale?page=2
,对于除第一页之外的任何其他页面,此操作均类似。我遇到了与您正在处理的问题类似的问题,并且发现此方法更加容易且不易出错。