Python Web抓取

时间:2019-02-25 20:36:19

标签: python-3.x web-scraping beautifulsoup jupyter-notebook html-parsing

我只想使用python从HTML提取数据。(我需要data = 20%) 任何帮助,将不胜感激。

<div class="ratings-container">
  <div class="ratings">
    <div class="ratings active" style="width: 20%"></div>
  </div>
</div>

我不知道如何获取样式内容。以下类似代码的结果为NULL:

mratingNew = (tag.findAll('div',attrs={"class":"ratings active"})) 
for i in range(len(muserName)): 
    print(mratingNew[i].['style']) 

1 个答案:

答案 0 :(得分:0)

您可以使用git request-pull develop test_pr获得宽度,并可以根据find进行分割

:

输出:

from bs4 import BeautifulSoup

html = '''<div class="ratings-container">
  <div class="ratings">
 <div class="ratings active" style="width: 20%"></div>
   </div>
     </div>'''

soup = BeautifulSoup(html,"html.parser")
finddiv = soup.find('div',attrs={'class':'ratings active'})
style = finddiv['style']

style = style.split(':',1)[-1]
print style

如果您的同一个类名具有多个宽度,例如:

 20%

您需要使用html = '''<div class="ratings-container"> <div class="ratings"> <div class="ratings active" style="width: 20%"></div> <div class="ratings active" style="width: 40%"></div> <div class="ratings active" style="width: 30%"></div> </div> </div>''' 并将其一一拆分

findAll

输出:

find_last_div = soup.findAll('div',attrs={'class':'ratings active'})
for width_value in find_last_div:
    width_Get = width_value['style'].split(':',1)[-1]
    print width_Get