使用BeautifulSoup将<span>标记提取到内容页码

时间:2016-11-14 16:42:57

标签: python-3.x beautifulsoup python-requests

大家好我从python + beautufulsoup4 +请求开始,需要删除标签span而没有任何id或类切割来提取我需要去的页面数量或者废弃这段代码页我只是需要删除

<div class="pagination">
    <select id="CurrentPage" data-val="true" data-val-number="The field CurrentPage must be a number." data-val-required="The CurrentPage field is required." name="CurrentPage">
    <option selected="selected" value="1">1</option>
    <option value="2">2</option>
    <option value="3">3</option>
    <option value="4">4</option>
    <option value="5">5</option>
    <option value="6">6</option>
    <option value="7">7</option>
    <option value="8">8</option>
    <option value="9">9</option>
    <option value="10">10</option>
    <option value="11">11</option>
    <option value="12">12</option>
    <option value="13">13</option>
    <option value="14">14</option>
    <option value="15">15</option>
    <option value="16">16</option>
    <option value="17">17</option>
    <option value="18">18</option>
    <option value="19">19</option>
    <option value="20">20</option>
    <option value="21">21</option>
    <option value="22">22</option>
    <option value="23">23</option>
    <option value="24">24</option>
    <option value="25">25</option>
    <option value="26">26</option>
    <option value="27">27</option>
    <option value="28">28</option>
    </select>
    <span>of 28</span>
    <a class="btn next" href="/listings/trucks/for-sale/list/category/27/trucks/manufacturer/international/model-group/9400?page=2">Next »</a>
</div>

1 个答案:

答案 0 :(得分:0)

import re
span_tag = soup.find(name='span',text=re.compile(r'of \d+'))
page_num = span_tag.text.rstrip('of ')