用漂亮的汤解析这些数据

时间:2016-11-14 00:14:13

标签: beautifulsoup

我粘贴了一些HTML数据。这不是整个页面。还有很多其他的div等,以及我不需要的东西。以下是此示例中最相关的部分。

<div class="prox_sub22"><a onclick="return srpVehicleNameGAEvent();"
  data-bind="attr:{href:VehicleDetailUrl, id: GetVehicleTitleId()}, html:AN.DSF.Data.SearchObjects.GetShortenVehicleName(Year+' '+Make+' '+Model+' '+StyleName)"
  href="/new/2016/ford/expedition-el/king ranch/1FMJK1JTXGEF06926"
  id="Vehicle_Title_0">2016 Ford Expedition EL 4WD 4d...</a>
</div>
<div class="prod_vin txt11"><span data-bind="attr:{ id: GetVinLabelId()}"
  id="VIN_Label_0">VIN:</span>
  <span data-bind="attr:{ id: GetVehicleVinId()}, text:Vin"
  id="Vehicle_Vin_0">1FMJK1JTXGEF06926</span>
</div>
<div class="prox_sub22"><a onclick="return srpVehicleNameGAEvent();"
  data-bind="attr:{href:VehicleDetailUrl, id: GetVehicleTitleId()}, html:AN.DSF.Data.SearchObjects.GetShortenVehicleName(Year+' '+Make+' '+Model+' '+StyleName)"
  href="/new/2017/ford/super-duty-f--250-srw/xl/1FT7W2BT3HEC24765"
  id="Vehicle_Title_1">2017 Ford Super Duty F-250 SRW...</a>
</div>
<div class="prod_vin txt11"><span data-bind="attr:{ id: GetVinLabelId()}"
  id="VIN_Label_1">VIN:</span>
  <span data-bind="attr:{ id: GetVehicleVinId()}, text:Vin"
  id="Vehicle_Vin_1">1FT7W2BT3HEC24765</span>
</div>

我只粘贴HTML文件的一小部分以供参考...页面上有72个(包含0个)。我正在尝试遍历此页面并获取页面上的所有Vehicle_Vin。

这是我的代码

soup = BeautifulSoup(f,"html.parser")
    stock = soup.find_all("div", attrs={"class" : "prod_vin txt11"})


    for j in range(0,72):

        vinlist = []

        for subvins in stock:
            vins = subvins.find_all("span", attrs={"id" : "Vehicle_Vin_{}".format(j)})
        for vin in vins:
            vinlist.append(vin.string)
        print(j,vinlist)

所以问题是,当这种情况发生时,我一直得到一个NULL列表,我似乎无法找出原因。

非常感谢任何建议。

2 个答案:

答案 0 :(得分:0)

doc = """
<div class="prox_sub22"><a onclick="return srpVehicleNameGAEvent();" data-bind="attr:{href:VehicleDetailUrl, id: GetVehicleTitleId()}, html:AN.DSF.Data.SearchObjects.GetShortenVehicleName(Year+' '+Make+' '+Model+' '+StyleName)" href="/new/2016/ford/expedition-el/king ranch/1FMJK1JTXGEF06926" id="Vehicle_Title_0">2016 Ford Expedition EL 4WD 4d...</a></div>
  <div class="prod_vin txt11"><span data-bind="attr:{ id: GetVinLabelId()}" id="VIN_Label_0">VIN:</span> <span data-bind="attr:{ id: GetVehicleVinId()}, text:Vin" id="Vehicle_Vin_0">1FMJK1JTXGEF06926</span></div>
  <div class="prox_sub22"><a onclick="return srpVehicleNameGAEvent();" data-bind="attr:{href:VehicleDetailUrl, id: GetVehicleTitleId()}, html:AN.DSF.Data.SearchObjects.GetShortenVehicleName(Year+' '+Make+' '+Model+' '+StyleName)" href="/new/2017/ford/super-duty-f--250-srw/xl/1FT7W2BT3HEC24765" id="Vehicle_Title_1">2017 Ford Super Duty F-250 SRW...</a></div>
  <div class="prod_vin txt11"><span data-bind="attr:{ id: GetVinLabelId()}" id="VIN_Label_1">VIN:</span> <span data-bind="attr:{ id: GetVehicleVinId()}, text:Vin" id="Vehicle_Vin_1">1FT7W2BT3HEC24765</span></div>
"""

from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(doc,'lxml')

for i in soup.find_all(id=re.compile(r"Vehicle_Title_\d+")):
    print(i.get('id'), i.get_text(), sep='\t\t')

out_put:

Vehicle_Title_0     2016 Ford Expedition EL 4WD 4d...
Vehicle_Title_1     2017 Ford Super Duty F-250 SRW...

答案 1 :(得分:0)

我现在意识到我需要将vinlist = []放在for j的range()循环之外。

感谢大家的帮助!