Python - 麻烦解析XML

时间:2015-12-29 12:08:33

标签: python xml xml-parsing

我在使用Python解析这个XML时遇到了问题,我尝试了不同的方法,minidom,ElementTree,但是无法掌握它。

我想要的XML是: 对于每个<SPB name="">,我想知道标记<NTP_BOOL name="Active" "value="true"/>中的属性“value”是TRUE还是FALSE。

如果它为TRUE,我想在标签<NTP_TXT name="Polygon" value="some data"/>

中提取属性“value”

XML:

<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<SPB name="PolygonList">
  <SPB name="0">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="((2046,9;365,77)(2044,5;366,64)(2034,8;371,17)(2026,1;374,19)(2016,9;375,71)(2008;376,35)(1996,4;375,06)(1985,4;371,6)(1973,7;365,56)(1963,4;360,16)(1954,8;356,05)(1945,7;352,38)(1936;351,3)(1926,5;350,87)(1915,9;351,52)(1907,5;353,9)(1899,6;357,13)(1893,2;361,45)(1886;366,64)(1878,5;371,82)(1870,7;376,79)(1863;381,1)(1854,6;383,91)(1848,9;386,29)(1842,3;387,58)(1834,5;388,23)(1825,4;387,58)(1817,5;387,15)(1809,6;385,21)(1803,7;383,26)(1797;380,02)(1790,4;376,35)(1784,8;371,82)(1778,1;366,42)(1771,4;359,08)(1765,1;350,01)(1760,2;338,56)(1756,5;327,77)(1754,1;315,24)(1753,5;305,31)(1753,8;294,86)(1752,9;285,57)(1751,5;275,66)(1748,5;268,09)(1744,1;260,96)(1737;253,82)(1729,2;249,22)(1720,1;245,12)(1713;243,6)(1705;242,96)(1678,8;242,74)(1633,7;244,68)(1578,6;246,41)(1544,8;247,5)(1510,7;249,22)(1476,9;250,09)(1443,1;251,39)(1410,1;253,33)(1376,1;254,41)(1342,9;255,71)(1309,6;257,01)(1274,9;258,3)(1241,9;259,38)(1207,9;260,47)(1175,4;261,76)(1162,1;261,98)(1155,2;261,55)(1147,4;258,95)(1137,6;255,28)(1128,6;251,17)(1120,3;248,58)(1111,3;247,5)(1102,3;247,5)(1094;248,58)(1086;251,39)(1078,5;256,57)(1071,9;262,41)(1065,4;270,63)(1059,2;283,38)(1055,3;293,97)(1052,3;301,11)(1047,5;309,1)(1042,6;316,67)(1036,6;323,8)(1027,7;333,53)(1013,2;349,53)(992,61;370,12)(984,18;379,2)(952,3;413,71)(910,64;456,69)(872,36;496,8)(835,42;535,77)(771,58;603,19)(738,36;638,48)(685,85;693,42)(638,23;743,91)(604,57;779,67)(580,79;805,5)(574,1;813,92)(568,5;821,91)(558,42;837,67)(550,85;849,77)(537,01;871,88)(518,72;902,85)(503,58;927,12)(487,09;955,07)(481,87;964,44)(478,42;971,22)(475,83;977,27)(474,53;983,53)(474,1;992,17)(474,32;1003,2)(476,69;1012,8)(480,36;1022,9)(483,28;1031,5)(484,58;1038,5)(484,8;1050,3)(485,01;1059,2)(483,07;1069,6)(479,61;1078,7)(474,42;1089,2)(466,42;1099,1)(458,64;1106,2)(449,34;1112,1)(442,15;1115,4)(434,15;1117,7)(422,69;1119)(411,88;1120,3)(402,8;1122,9)(391,93;1127,3)(384,15;1133,7)(376,15;1141,6)(369,88;1151,1)(363,47;1161,5)(351,54;1181,6)(326,43;1223,5)(316,08;1242,5)(311,64;1252,2)(308,93;1260,5)(307,24;1268,7)(305,76;1281,1)(305,76;1291,8)(306,25;1321,5)(306,71;1351,3)(307,45;1381,3)(307,45;1411,3)(307,94;1441,3)(307,2;1451,5)(305,72;1462,1)(303,01;1470,8)(294,36;1500)(287,98;1521,1)(281,08;1553,1)(274,67;1587,8)(270,72;1620,5)(268,75;1650)(268,59;1680,5)(269;1709,3)(270,48;1738,5)(273,93;1769,8)(278,61;1799,3)(284,28;1829,3)(292,38;1859,3)(301,29;1888,8)(311,64;1918,8)(322,48;1948,3)(324,95;1960)(326,67;1973,3)(327,17;1993,5)(329,63;2006,5)(336,23;2021,4)(348,37;2035,5)(361,67;2050,7)(368,57;2062,8)(372,27;2078)(371,53;2090,5)(369,07;2100,9)(364,14;2109,8)(356;2120,4)(348,36;2129,9)(340,97;2140,2)(336,04;2149,1)(334,07;2156,8)(332,84;2172,2)(332,59;2187,8)(333,01;2217,5)(334;2247)(336,04;2275,3)(336,04;2305)(337,03;2334,8)(337,76;2365,5)(338,26;2395,5)(340,19;2425)(340,23;2445,3))"/>
  </SPB>
  <SPB name="1">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="((1361,8;1216,3)(1366,3;1203,1)(1375,2;1187,6)(1386,7;1174,2)(1399,5;1166,1)(1412,7;1162,1)(1428,7;1160,7)(1444,8;1163)(1459,9;1169)(1472;1177,6)(1483,4;1189,6)(1489,4;1201,7)(1493,7;1216,8)(1494,3;1233,2)(1491,4;1247,8)(1484,6;1261,5)(1474;1274,1)(1463,4;1283,3)(1447,3;1290,5)(1431;1293)(1417;1292,5)(1400,4;1289,3)(1389,8;1282,7)(1375,8;1270,4)(1365,2;1254,9)(1361,2;1238)(1360,6;1226,3)(1361,5;1216,3)(1366;1203,1))"/>
  </SPB>
  <SPB name="2">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="((534,99;2313,9)(542,33;2295,3)(553,02;2280,6)(567,38;2269,6)(582,66;2262,6)(598,24;2259,5)(612,29;2260,1)(626,96;2263,5)(642,84;2271,7)(656,9;2284,3)(666,98;2298,4)(671,56;2314,9)(672,48;2332)(670,34;2349,4)(663,62;2366,8)(652,01;2380,6)(638,87;2390,1)(627,26;2395)(612,6;2398,3)(597,01;2398,3)(580,82;2395)(566,46;2387,9)(553,63;2376,3)(542,94;2361,6)(536,22;2347)(534,08;2328)(534,69;2313,9)(542,33;2295,6))"/>
  </SPB>
  <NTP_INT name="Count" value="3"/>
</SPB>

2 个答案:

答案 0 :(得分:1)

您可以使用etree模块以及XPath功能:

from lxml import etree

content = '''<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<SPB name="PolygonList">
  <SPB name="0">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="example1"/>
  </SPB>
  <SPB name="1">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="example2"/>
  </SPB>
  <SPB name="2">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="example3"/>
  </SPB>
  <NTP_INT name="Count" value="3"/>
</SPB>'''

tree = etree.XML(content)
values = tree.xpath('//NTP_BOOL[@value="true"]/following-sibling::NTP_TXT[@name="Polygon"]/@value')
print(values)

<强>输出

['example1', 'example2', 'example3']

这会查找包含属性NTP_BOOL的所有value="True"元素,然后找到属性为following-sibling的所有NTP_TXT name="Polygon"个元素并返回{{1}元素的属性。

答案 1 :(得分:0)

等效选项是使用 xml.etree.ElementTree 模块。 使用@ lambo477 xml示例,它看起来像:

import xml.etree.ElementTree as ET

content = '''<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<SPB name="PolygonList">
  <SPB name="0">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="example1"/>
  </SPB>
  <SPB name="1">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="example2"/>
  </SPB>
  <SPB name="2">
    <NTP_BOOL name="Active" value="true"/>
    <NTP_TXT name="Custom" value=""/>
    <NTP_TXT name="CustomMatch" value=""/>
    <NTP_TXT name="Polygon" value="example3"/>
  </SPB>
  <NTP_INT name="Count" value="3"/>
</SPB>'''

stuff = ET.fromstring(content)
lst = [x.get('value') for x in stuff.findall("./SPB/NTP_BOOL[@value='true']/../NTP_TXT[@name='Polygon']")]
print lst

输出

['example1', 'example2', 'example3']

最后一条评论:如果您从磁盘中读取xml文件,请使用:

stuff = ET.parse('your-file.xml')

而不是:

stuff = ET.fromstring(content)