Beautifulsoup得到跨度标签价值内容对

时间:2016-11-19 09:35:50

标签: python beautifulsoup

我正试图从这个HTML中提取“Slutrengøringalm。(DKK 750,00)DKK”:

<div id="bookingpartoptionalitems" class="paddingLeft paddingRight">
<div class="title paddingTop">Valgfrie tilkøb:</div>
<div class="dots dotsHeight alignment-line">
    <div class="alignment-container optional-items-controlarea"><span class="control-area checkboxArea paddingRight negMarginTop">  <input id="fvF3625F31BE0A4F0A8DCD3F59477CD535" type="checkbox" class="checkbox" value="1"></span>
    </div>
    <div class="alignment-container optional-items-namearea"><span class="BookingDataItemName paddingRight"><label for="fvF3625F31BE0A4F0A8DCD3F59477CD535">Håndklæder (leje)</label> <span class="BookingDataItemUnitPrice">(<span class="currency">DKK</span> <span class="value">112,00</span>)</span>
        </span>
    </div>
    <div class="alignment-container"><span class="BookingDataItemTotalPrice paddingLeft"><span class="currency">DKK</span> <span class="value">0,00</span></span>
    </div>
    <div class="alignment-container"></div>
</div>
<div class="dots dotsHeight alignment-line">
    <div class="alignment-container optional-items-controlarea"><span class="control-area checkboxArea paddingRight negMarginTop"><input id="fvC7796D75FE6D429187EB9705D87B0289" type="checkbox" class="checkbox" value="1"></span>
    </div>
    <div class="alignment-container optional-items-namearea"><span class="BookingDataItemName paddingRight"><label for="fvC7796D75FE6D429187EB9705D87B0289">Slutrengøring alm.</label> <span class="BookingDataItemUnitPrice">(<span class="currency">DKK</span> <span class="value">750,00</span>)</span>
        </span>
    </div>
    <div class="alignment-container"><span class="BookingDataItemTotalPrice paddingLeft"><span class="currency">DKK</span> <span class="value">0,00</span></span>
    </div>
    <div class="alignment-container"></div>
</div>
<div class="dots dotsHeight alignment-line">
    <div class="alignment-container optional-items-controlarea"><span class="control-area checkboxArea paddingRight negMarginTop"><input id="fv64F0EAE9857F4D219BB3EDE247ED6EA8" type="checkbox" class="checkbox" value="1"></span>
    </div>
    <div class="alignment-container optional-items-namearea"><span class="BookingDataItemName paddingRight"><label for="fv64F0EAE9857F4D219BB3EDE247ED6EA8">Leje Sengelinnede </label> <span class="BookingDataItemUnitPrice">(<span class="currency">DKK</span> <span class="value">112,00</span>)</span>
        </span>
    </div>
    <div class="alignment-container"><span class="BookingDataItemTotalPrice paddingLeft"><span class="currency">DKK</span> <span class="value">0,00</span></span>
    </div>
    <div class="alignment-container"></div>
</div>
<div class="dots dotsHeight alignment-line last-item">
    <div class="alignment-container optional-items-controlarea"><span class="control-area checkboxArea paddingRight negMarginTop"><input id="fvF418ABD7452A45C2B22F98AE5348B13F" type="checkbox" class="checkbox" value="1"></span>
    </div>
    <div class="alignment-container optional-items-namearea"><span class="BookingDataItemName paddingRight"><label for="fvF418ABD7452A45C2B22F98AE5348B13F">Internet</label> <span class="BookingDataItemUnitPrice">(<span class="currency">DKK</span> <span class="value">149,00</span>)</span>
        </span>
    </div>
    <div class="alignment-container"><span class="BookingDataItemTotalPrice paddingLeft"><span class="currency">DKK</span> <span class="value">0,00</span></span>
    </div>
    <div class="alignment-container"></div>
</div>

我尝试了bsObj.select("#bookingpartoptionalitems label")输出:

[<label for="fvEC6D027BF92643FB915F1B3D40C2ADAC">Senget▒jspakke</label>, <label for="fv4C0AAC0318FC408C9D42A6EC152AE878">Barnestol</label>, <label for="fv1B2B8ADFBAA74CE094B55514FF02674F">Barneseng</label>, <label for="fvCA3BB2602AD44C07A1F38B430A73D699">Ekstra Fryser (100L) inkl. levering</label>, <label for="fv7F8D503E6BE84A78A54C92001C195DCA">Levering/afhentning tilk▒bte varer</label>, <label for="fv62D7E7BCC1914FBB82802AF9A0D10B27">Tr▒kvogn</label>, <label for="fvF3D92DC8F8BC43F48525A9D032A6130F">Afbestillingsforsikring (ingen selvrisiko)</label>, <label for="fv3CED5B2C3ADC4309A3B7EEA11BBC924D">Kombiforsikring (ingen selvrisiko)</label>, <label for="fv5BC0B453EA5A42E19BFCAC87739CC515">Beach Bowl Key2Activity</label>]

bsObj.select("#bookingpartoptionalitems .value")输出:

[<span class="value">105,00</span>, <span class="value">0,00</span>, <span class="value">0,00</span>, <span class="value">0,00</span>, <span class="value">0,00</span>, <span class="value">0,00</span>, <span class="value">300,00</span>, <span class="value">0,00</span>, <span class="value">140,00</span>, <span class="value">0,00</span>, <span class="value">125,00</span>, <span class="value">0,00</span>, <span class="value">243,00</span>, <span class="value">0,00</span>, <span class="value">360,00</span>, <span class="value">0,00</span>, <span class="value">119,00</span>, <span class="value">0,00</span>]

是否有办法成对获取标签和价值。无法使用标签for="fvC7796D75FE6D429187EB9705D87B0289",因为它似乎是动态生成的。

我希望有人可以提供帮助。

2 个答案:

答案 0 :(得分:1)

所以你想得到所有的标签价值对?那么一种方法是你可以运行你已经尝试过的两个查询并组合数据,因为我相信它将是有序的。或者你可以这样做:

items = bsObj.find_all('div', class_='optional-items-namearea')

for item in items:
    print(item.label.get_text(), item.find('span', class_='value').get_text())

这将找到类"optional-items-namearea"的所有项目,然后迭代它们并提取标签内的文本。对于值,您需要使用find,因为它位于另一个元素中。

对于您的示例数据,输出将为:

Håndklæder (leje) 112,00
Slutrengøring alm. 750,00
Leje Sengelinnede  112,00
Internet 149,00

答案 1 :(得分:1)

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'lxml')
divs = soup.find_all(class_="alignment-container optional-items-namearea")

for div in divs:
    pair = div.get_text(strip=True)
    print(pair)

出:

Håndklæder (leje)(DKK112,00)
Slutrengøring alm.(DKK750,00)
Leje Sengelinnede(DKK112,00)
Internet(DKK149,00)