初学者网页抓取问题(提取相同模式代码的值)

时间:2021-04-20 17:17:00

标签: python web-scraping

想问一下如何在下面的HTML代码中提取<,span>1中的值“1”?(这只是整个代码的一部分,我会做一个for循环稍后用于提取 和 之间的所有值,因为代码的其余部分遵循相同的模式。谢谢!

    <div class="ipl-ratings-bar">
            <span class="rating-other-user-rating">
            <svg class="ipl-icon ipl-star-icon  " xmlns="http://www.w3.org/2000/svg" fill="#000000" height="24" viewBox="0 0 24 24" width="24">
                <path d="M0 0h24v24H0z" fill="none"></path>
                <path d="M12 17.27L18.18 21l-1.64-7.03L22 9.24l-7.19-.61L12 2 9.19 8.63 2 9.24l5.46 4.73L5.82 21z"></path>
                <path d="M0 0h24v24H0z" fill="none"></path>
            </svg>
                <span>1</span><span class="point-scale">/10</span>
            </span>
    </div>```




My code:
ratetable=info_results.find_all('div', {'class': 'ipl-ratings-bar'}) 
valuetable=ratetable.find_all('span')

It ends up showing ```AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?```

1 个答案:

答案 0 :(得分:0)

对于此类问题,可能值得学习有关 xpath 的知识。

In [1]: from lxml import html

In [2]: string = '''<div class="ipl-ratings-bar">
   ...:             <span class="rating-other-user-rating">
   ...:             <svg class="ipl-icon ipl-star-icon  " xmlns="http://www.w3.org/2000/svg" fill="#000000" height="24" viewBox="0 0 24 24" widt
   ...: h="24">
   ...:                 <path d="M0 0h24v24H0z" fill="none"></path>
   ...:                 <path d="M12 17.27L18.18 21l-1.64-7.03L22 9.24l-7.19-.61L12 2 9.19 8.63 2 9.24l5.46 4.73L5.82 21z"></path>
   ...:                 <path d="M0 0h24v24H0z" fill="none"></path>
   ...:             </svg>
   ...:                 <span>1</span><span class="point-scale">/10</span>
   ...:             </span>
   ...:     </div>'''

In [3]: root = html.fromstring(string)

In [4]: root.xpath('//div[@class="ipl-ratings-bar"]/span/span[1]/text()')
Out[5]: ['1']
相关问题