这里有我需要解析的XML数据,并且应该提取某些信息。但是,当我尝试使用beautifulSoup从xml中提取 name 字段时,有一个陷阱。
function deleteRows() {
var sheet = SpreadsheetApp.getActiveSheet();
var rows = sheet.getDataRange();
var numRows = rows.getNumRows();
var values = rows.getValues();
var rowsDeleted = 0;
for (var i = 0; i <= numRows - 1; i++) {
var row = values[i];
if (row[0] == 'delete' || row[0] == '') { // This searches all cells in columns A (change to row[1] for columns B and so on) and deletes row if cell is empty or has value 'delete'.
sheet.deleteRow((parseInt(i)+1) - rowsDeleted);
rowsDeleted++;
}
}
};
我将BeautifulSoup用作标准方法,不能更改为任何其他软件包。因此,使用相同的解决方法将不胜感激。
下面是XML数据:以粗体突出显示的数据需要提取。
<attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">
下面是我的python代码:
<configurations>
<attributes-configuration>
<attributes>
<attribute-item id="mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw">
<name>priority</name>
<description>priority of a requirement</description>
<customization-element>mydata.core.customization.requirements</customization-element>
<attribute-type>mydata.attribute_type.list</attribute-type>
<options>
<option>
<key>DEFAULT_LIST</key>
<value class="java.lang.String"> high,low,medium</value>
</option>
<option>
<key>LIST_TYPE</key>
<value class="java.lang.String">CUSTOM</value>
</option>
</options>
<editable>true</editable>
<userDefined>true</userDefined>
<internal>false</internal>
</attribute-item>
<attribute-item id="mydata.core.customization.teststep.prerequisite">
<name>Prerequisite</name>
<description>User Defined Attribute</description>
<customization-element>mydata.core.customization.teststep</customization-element>
<attribute-type>mydata.attribute_type.string</attribute-type>
<options>
<option>
<key>DEFAULT_VALUE</key>
<value/>
</option>
<option>
<key>MAX_CHARACTERS</key>
<value class="java.lang.String">5000</value>
</option>
</options>
<editable>true</editable>
<userDefined>true</userDefined>
<internal>false</internal>
</attribute-item>
</attributes>
</attributes-configuration>
<test-management/>
</configurations>
我的输出:
import os
from bs4 import BeautifulSoup as bs
fileName = 'Configuration.xml'
fullFile = os.path.abspath(os.path.join('DataTransporter', fileName))
attributeList = []
with open(fullFile) as f:
soup = bs(f, 'xml')
for attribData in soup.find_all('attribute-item'):
dat = {
'attribName' : attribData.name,
'attribDesc' : attribData.description.text,
'attribValue' : attribData.options.value.text,
}
attributeList.append(dat)
#for attribParams in soup.find_all(name = 'value'):
#newdict[attribName.text] = attribParams.text
print(attributeList)
预期输出:
[{'attribName': 'attribute-item', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'attribute-item', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]
答案 0 :(得分:1)
起初我认为应该使用attribData.name.text
来做到这一点,但似乎'name'是attribData
的某种关键字属性。
为了获得正确的值,您可以使用findChildren(<key>)
方法,如下所示:
attribData.findChildren('name')[0].text
findChildren()
返回一个列表,在这种情况下,该列表只有一个值,因此使用[0]
获取元素,然后使用.text
获取期望值是很有意义的。
要获取ID,您可以使用attribData['id']
。
总之,您的代码如下所示(在for循环内):
dat = {
'attribName' : attribData.findChildren('name')[0].text,
'id': attribData['id'],
'attribDesc' : attribData.description.text,
'attribValue' : attribData.options.value.text,
}
输出看起来像这样:
[{'attribName': 'priority', 'id': 'mydata.core.customization.requirements._noSpwIUSEei1hLMz9D9OBw', 'attribDesc': 'priority of a requirement', 'attribValue': ' high,low,medium'}, {'attribName': 'Prerequisite', 'id': 'mydata.core.customization.teststep.prerequisite', 'attribDesc': 'User Defined Attribute', 'attribValue': ''}]
希望对您有帮助!