具有相同类的div的Beautifulsoup打印属性值

时间:2018-11-21 15:58:50

标签: python selenium web-scraping beautifulsoup

我正在执行以下代码,这些代码将在value =

之后打印文本
soup = BeautifulSoup(html, 'lxml')

name = soup.find('input')['value']

print(name)

但是页面上有多个具有相同类的div,我尝试了findAll,但出现错误,并且只能打印第一个字段值,即Name。

请参阅所附的屏幕截图

Browser Capture

<div class="control-group"><label class="control-label required" for="client_appbundle_prospecttype_ProspectFirstContact_decision_timeframe">What date do you want to make a decision?</label>
  <div class="controls"><input type="text" id="client_appbundle_prospecttype_ProspectFirstContact_decision_timeframe" name="client_appbundle_prospecttype[ProspectFirstContact][decision_timeframe]" required="required" class="input-small text-bound datepicker hasDatepicker"></div>
</div>

</div>
</div>
</div>
</div>

<div class="tab-pane active" id="prospect_consultation">
  <div class="widget row-fluid">
    <div class="span12">
      <div class="navbar">
        <div class="navbar-inner">
          <h6>Personal details</h6>
        </div>
      </div>
      <div class="well">
        <div class="control-group">
          <label class="control-label">Name</label>
          <div class="controls">
            Sam Test-March 2018
          </div>
        </div>

        <div class="control-group">
          <label class="control-label">Address and postcode</label>
          <div class="controls">

          </div>
        </div>

        <div class="control-group">
          <label class="control-label">Mobile number</label>
          <div class="controls">
            12345678
          </div>
        </div>

        <div class="control-group">
          <label class="control-label">Email address</label>
          <div class="controls">
            test@test.com
          </div>
        </div>

谢谢!

1 个答案:

答案 0 :(得分:0)

也许像这样:

from bs4 import BeautifulSoup
html = '''
<html>
 <head></head>
 <body>
  <div class="control-group">
   <label class="control-label required" for="client_appbundle_prospecttype_ProspectFirstContact_decision_timeframe">What date do you want to make a decision?</label>
   <div class="controls">
    <input type="text" id="client_appbundle_prospecttype_ProspectFirstContact_decision_timeframe" name="client_appbundle_prospecttype[ProspectFirstContact][decision_timeframe]" required class="input-small text-bound datepicker hasDatepicker">
   </div>
  </div>     
  <div class="tab-pane active" id="prospect_consultation"> 
   <div class="widget row-fluid"> 
    <div class="span12"> 
     <div class="navbar"> 
      <div class="navbar-inner"> 
       <h6>Personal details</h6> 
      </div> 
     </div> 
     <div class="well"> 
      <div class="control-group"> 
       <label class="control-label">Name</label> 
       <div class="controls">
         Sam Test-March 2018 
       </div> 
      </div> 
      <div class="control-group"> 
       <label class="control-label">Address and postcode</label> 
       <div class="controls"> 
       </div> 
      </div> 
      <div class="control-group"> 
       <label class="control-label">Mobile number</label> 
       <div class="controls">
         12345678 
       </div> 
      </div> 
      <div class="control-group"> 
       <label class="control-label">Email address</label> 
       <div class="controls">
         test@test.com 
       </div> 
      </div>
     </div>
    </div>
   </div>
  </div>
 </body>
</html>
'''
soup = BeautifulSoup(html, "lxml")

items = soup.select('.controls')
print([item.text.strip() for item in items if item.text.strip()])