使用python使用BeautifulSoup包解析XML数据时出现问题。需要新鲜的眼睛来指导

时间:2018-08-09 13:59:23

标签: python xml parsing beautifulsoup

这里有我需要解析的XML数据,并且应该提取某些信息。但是,当我尝试使用beautifulSoup从xml中提取 name 字段时,有一个陷阱。

  1. 问题1: 我将其父级的名称命名为属性,而不是将 name 字段中的数据命名为“ 优先级
  2. 问题2: 我还需要从XML中提取ID,该ID为 function deleteRows() { var sheet = SpreadsheetApp.getActiveSheet(); var rows = sheet.getDataRange(); var numRows = rows.getNumRows(); var values = rows.getValues(); var rowsDeleted = 0; for (var i = 0; i <= numRows - 1; i++) { var row = values[i]; if (row[0] == 'delete' || row[0] == '') { // This searches all cells in columns A (change to row[1] for columns B and so on) and deletes row if cell is empty or has value 'delete'. sheet.deleteRow((parseInt(i)+1) - rowsDeleted); rowsDeleted++; } } };

我将BeautifulSoup用作标准方法,不能更改为任何其他软件包。因此,使用相同的解决方法将不胜感激。

下面是XML数据:以粗体突出显示的数据需要提取。

<attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">

下面是我的python代码:

<configurations>
   <attributes-configuration>
      <attributes>
         <attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">
            <name>priority</name>
            <description>priority of a requirement</description>
            <customization-element>mydata.core.customization.requirements</customization-element>
            <attribute-type>mydata.attribute_type.list</attribute-type>
            <options>
               <option>
                  <key>DEFAULT_LIST</key>
                  <value class="java.lang.String"> high,low,medium</value>
               </option>
               <option>
                  <key>LIST_TYPE</key>
                  <value class="java.lang.String">CUSTOM</value>
               </option>
            </options>
            <editable>true</editable>
            <userDefined>true</userDefined>
            <internal>false</internal>
         </attribute-item>
         <attribute-item id="mydata.core.customization.teststep.prerequisite">
            <name>Prerequisite</name>
            <description>User Defined Attribute</description>
            <customization-element>mydata.core.customization.teststep</customization-element>
            <attribute-type>mydata.attribute_type.string</attribute-type>
            <options>
               <option>
                  <key>DEFAULT_VALUE</key>
                  <value/>
               </option>
               <option>
                  <key>MAX_CHARACTERS</key>
                  <value class="java.lang.String">5000</value>
               </option>
            </options>
            <editable>true</editable>
            <userDefined>true</userDefined>
            <internal>false</internal>
         </attribute-item>
      </attributes>
   </attributes-configuration>
   <test-management/>
</configurations>

我的输出:

import os
from bs4 import BeautifulSoup  as bs  

fileName = 'Configuration.xml'
fullFile = os.path.abspath(os.path.join('DataTransporter', fileName))
attributeList = []
with open(fullFile) as f:
    soup = bs(f, 'xml')

for attribData in soup.find_all('attribute-item'):
    dat = {
            'attribName' : attribData.name,
            'attribDesc' : attribData.description.text,
            'attribValue' : attribData.options.value.text,
          }
    attributeList.append(dat)
    #for attribParams in soup.find_all(name = 'value'):
    #newdict[attribName.text] = attribParams.text
print(attributeList)

预期输出:

[{'attribName': 'attribute-item', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'attribute-item', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]

1 个答案:

答案 0 :(得分:1)

起初我认为应该使用attribData.name.text来做到这一点,但似乎'name'是attribData的某种关键字属性。 为了获得正确的值,您可以使用findChildren(<key>)方法,如下所示:

attribData.findChildren('name')[0].text

findChildren()返回一个列表,在这种情况下,该列表只有一个值,因此使用[0]获取元素,然后使用.text获取期望值是很有意义的。

要获取ID,您可以使用attribData['id']。 总之,您的代码如下所示(在for循环内):

dat = {
    'attribName' : attribData.findChildren('name')[0].text,
    'id': attribData['id'],
    'attribDesc' : attribData.description.text,
    'attribValue' : attribData.options.value.text,
}

输出看起来像这样:

[{'attribName': 'priority', 'id': 'mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'Prerequisite', 'id': 'mydata.core.customization.teststep.prerequisite', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]

希望对您有帮助!