使用beautifulsoup读取html标记属性时出错

时间:2017-08-02 09:13:04

标签: python beautifulsoup python-requests

我正在尝试使用beautifulsoup

基于data-property读取和列出td中的文本
 tr=BeautifulSoup(str(input),'lxml')
        tags=tr.findAll('td')
        for t in tags:      
            if t.attrs['data-property']== 'OSVersion':
               ver=t.text

这给了我错误,没有细节

KeyError: 'data-property'

请参阅以下作为输入的示例tr

<tr > 
<td class=" resizable reorderable" data-property="OSVersion">10.2.1</td>
<td class=" resizable reorderable" data-property="DisplayModel">iPad Mini 4 (64 GB Space Gray)</td>
<td class=" resizable reorderable" data-property="PhoneNumber"></td>
<td class="grid_customvariable_colsize resizable reorderable" data-property="DeviceCustomAttributeDetails"></td>
<td class=" resizable reorderable" data-property="DeviceTagDetails"></td>
<td class=" resizable reorderable" data-property="EnrollmentStatusName">    <div class="grid_resizable_col">Enrolled</div>
</td>
<td class=" resizable reorderable" data-property="ComplianceStatusName">    <div class="grid_resizable_col">Compliant</div>
</td>

<td class=" resizable reorderable" data-property="IMEI"></td>
<td class=" resizable reorderable" data-property="LocationGroupName">iOS</td>
<td class=" resizable reorderable" data-property="IsCompromisedYN">No</td>
<td class=" resizable reorderable" data-property="HomeCarrier">Not Reported </td>
<td class=" resizable reorderable" data-property="CurrentCarrier">Not Reported </td>
<td class=" resizable reorderable" data-property="WiFiIPAddress"></td>

<td class=" resizable reorderable" data-property="Notes"></td>
<td class=" resizable reorderable" data-property="WnsStatus">        <span>Disconnected</span>
</td>
<td class=" resizable reorderable" data-property="DmLastSeenTime">    <span class="icon arrow_down_stretched red">-</span>
</td>                    
</tr>

如果我接受单个dict如下,它可以正常工作

d={'class': ['', 'resizable', 'reorderable'], 'data-property': 'FriendlyName'}
print d['data-property']

任何人都知道如何解决它?

感谢

3 个答案:

答案 0 :(得分:2)

无需弄乱attrs

from bs4 import BeautifulSoup as BS

html = """<tr > 
<td class=" resizable reorderable" data-property="OSVersion">10.2.1</td>
<td class=" resizable reorderable" data-property="DisplayModel">iPad Mini 4 (64 GB Space Gray)</td>
<td class=" resizable reorderable" data-property="PhoneNumber"></td>
<td class="grid_customvariable_colsize resizable reorderable" data-property="DeviceCustomAttributeDetails"></td>
<td class=" resizable reorderable" data-property="DeviceTagDetails"></td>
<td class=" resizable reorderable" data-property="EnrollmentStatusName">    <div class="grid_resizable_col">Enrolled</div>
</td>
<td class=" resizable reorderable" data-property="ComplianceStatusName">    <div class="grid_resizable_col">Compliant</div>
</td>

<td class=" resizable reorderable" data-property="IMEI"></td>
<td class=" resizable reorderable" data-property="LocationGroupName">iOS</td>
<td class=" resizable reorderable" data-property="IsCompromisedYN">No</td>
<td class=" resizable reorderable" data-property="HomeCarrier">Not Reported </td>
<td class=" resizable reorderable" data-property="CurrentCarrier">Not Reported </td>
<td class=" resizable reorderable" data-property="WiFiIPAddress"></td>

<td class=" resizable reorderable" data-property="Notes"></td>
<td class=" resizable reorderable" data-property="WnsStatus">        <span>Disconnected</span>
</td>
<td class=" resizable reorderable" data-property="DmLastSeenTime">    <span class="icon arrow_down_stretched red">-</span>
</td>                    
</tr>"""

soup = BS(html)
tags=soup.findAll('td')
for t in tags:
    if t['data-property'] == 'OSVersion':
        ver=t.text
        print(ver)

输出:

10.2.1

答案 1 :(得分:0)

是的,没错。我们错了。

在您的代码中执行以下更改,因为您获得了KeyError

if 'data-property' in t.attrs and t.attrs['data-property']== 'OSVersion':

我对演示代码的回答:

t.attrs元组的返回列表。例如[(u'class', u' resizable reorderable'), (u'data-property', u'OSVersion')]

我们需要通过dict方法转换为字典格式。例如attributes = dict(t.attrs)

在条件下,检查键是否存在。例如if 'data-property' in attributes and attributes['data-property']== 'OSVersion':

<强>演示:

import BeautifulSoup
tr = BeautifulSoup.BeautifulSoup(data)
tags = tr.findAll('td')
for t in tags:    
    attributes = dict(t.attrs)
    if 'data-property' in attributes and attributes['data-property']== 'OSVersion':
        ver = t.text

如果您还有任何问题,请与我们联系。免费打电话给我。

答案 2 :(得分:0)

在这里。 代码:

from bs4 import BeautifulSoup
with open("xmlfile.xml", "r") as f: # opening xml file
    content = f.read() # xml content stored in this variable
soup = BeautifulSoup(content, "lxml")
for values in soup.findAll("td"):
    if  values["data-property"] == "OSVersion":
        print values.text

输出:

10.2.1