Question

我刚开始使用（=学习）Python 2.7。我目前关注的重点是从XML文件中提取信息。到目前为止，xml.etree.ElementTree让我走得很远。我现在陷入了“KeyError”。原因 - 据我所知 - 是具有不同属性的元素。

（更大）XML文件的关键部分：

Create Table PRODUCTS                                 
(                                
     Range_Event_Id int,                                 
     Store_Range_Grp_Id int,                                
     Ranging_Prod_No nvarchar(14) collate database_default,
     Space_Break_Code nchar(1) collate database_default
)                     

Create Clustered Index Idx_tmpLAR_PRODUCTS 
   ON PRODUCTS (Range_Event_Id, Ranging_Prod_No, Store_Range_Grp_Id, Space_Break_Code)

我的Python代码适用于提取数据类型和名称，即两列中都存在的属性：

<?xml version='1.0' encoding='utf-8' ?>

<XMLFILE>
  <datasources>
    <datasource caption='Sheet1 (ExcelSample)'>
      <connection class='excel-direct' filename='~\SomeExcel.xlsx' .....>
        ......
      </connection>
      <column header='Unit Price' datatype='real' name='[Calculation_1]'     role='measure' type='quantitative'>
        <calculation class='calculation' formula='Sum(Profit)/Sum(Sales)' />
      </column>
      <column datatype='integer' name='[Sales]' role='measure' type='quantitative' user:auto-column='numrec'>
        <calculation class='trial' formula='1' />
      </column>
    </datasource>
  </datasources>
  ........
</XMLFILE>

结果：

for cal in xmlfile.findall('datasources/datasource/column'):
    dt= cal.attrib[ 'datatype' ]
    nm= cal.attrib[ 'name' ]
    print 'Column name:', dt, '    ', 'datatype:', nm

但是如果我使用cal.attrib ['header'] Python 2.7。打印

Column name: Calculation_1,    datatype:real
Column name: Sales,    datatype:integer

问题：如何讲述Python 2.7。产生所需的输出：

"KeyError: 'header'

更确切地说，Python应该做什么：“for all（=如果只有一个像上面的例子中那样），包含属性'header'的列打印输出

Calculation "Unit Price": Sum(Profit)/Sum(Sales)

（注意：要显示更完整的所需输出，我添加了另一列，但我的示例中还没有这样做）

非常感谢您的帮助！

Answer 1

您可以使用XPath 谓词表达式按特定条件过滤元素，即过滤具有WHERE x = ANY(?)属性的column元素：header *。所以你的column[@header]循环看起来像这样：

for

*）请注意，for cal in xmlfile.findall('datasources/datasource/column[@header]'): print "header: " + cal.attrib["header"] print " formula: " + cal.find('calculation').attrib["formula"]语法用于在XPath中引用XML属性。

相反，如果您想要遍历所有 @attribute_name而不管它是否具有column属性，而只是header具有column时的打印标题属性值属性，然后您可以使用简单的if块来实现，如下所示：

if "header" in cal.attrib:
    print "header: " + cal.attrib["header"]

Answer 2

也许你可以使用＆＃34; BeautifullSoup＆＃34; （bs4）模块而不是＆＃34; xml.etree＆＃34;

查看Python BeautifulSoup XML Parsing和Extracting properly data with bs4?

Answer 3

KeyError是Python告诉您在该元素中找不到您请求的属性的方式。这没关系，很可能你的findpath xpath正在引入一些没有的东西＃34; header＆＃34;属性。因为您只对那些属于您的搜索范围的人感兴趣并碰巧有一个＆＃34;标题＆＃34;附加到它们的属性，您可以执行以下操作：

for cal in xmlfile.findall('datasources/datasource/column'):
    try:
        header = cal.attrib["header"]
        #Do something with the header
        print header
    except KeyError:
        #This is where you end up if the element doesn't have a 'header' attribute
        #You shouldn't have to do anything with this 'cal' element

当然，您可以先检查标头是否存在，但在使用python一段时间之后我认为这种方法更简单。

python 2.7：不同的属性

3 个答案: