使用漂亮的汤蟒3

时间:2019-05-06 06:20:18

标签: python python-3.x parsing beautifulsoup xml-parsing

我正在尝试使用Python解析AUTOSAR特定的arxml(类似于xml文件),但无法读取文件的内容。我想在多个DEFINITION-REF标签内获取定义的ECUC-CONTAINER-VALUE值,例如:

/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef

我尝试了多种方法,但是无法打印出内容。

from bs4 import BeautifulSoup as Soup

def parseArxml():
    handler = open('input.arxml').read()
    soup = Soup(handler,"html.parser")
    for ecuc_container in soup.findAll('ECUC-CONTAINER-VALUE'):
        print(ecuc_container)

if __name__ == "__main__":
    parseArxml()

这是arxml文件的一部分:

<?xml version="1.0" encoding="UTF-8"?>
<AUTOSAR xmlns="http://autosar.org/schema/r4.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://autosar.org/schema/r4.0 autosar_4-2-1.xsd">
      <ECUC-CONTAINER-VALUE UUID="c112c504-e546-41c3-abf9-0aaf06b18284">
      <SHORT-NAME>BswMLogicalExpression_2</SHORT-NAME>
      <DEFINITION-REF DEST="ECUC-PARAM-CONF-CONTAINER-DEF">/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression</DEFINITION-REF>
      <REFERENCE-VALUES>
        <ECUC-REFERENCE-VALUE>
          <DEFINITION-REF DEST="ECUC-CHOICE-REFERENCE-DEF">/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef</DEFINITION-REF>
          <VALUE-REF DEST="ECUC-CONTAINER-VALUE">/ARRoot/BswM_0/BswMConfig_0/BswMArbitration_0/BswMModeCondition_2</VALUE-REF>
        </ECUC-REFERENCE-VALUE>
      </REFERENCE-VALUES>
    </ECUC-CONTAINER-VALUE>

    <ECUC-CONTAINER-VALUE UUID="c112c504-e546-41c3-abf9-0aaf06b18284">
      <SHORT-NAME>BswMLogicalExpression_3</SHORT-NAME>
      <DEFINITION-REF DEST="ECUC-PARAM-CONF-CONTAINER-DEF">/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression</DEFINITION-REF>
      <REFERENCE-VALUES>
        <ECUC-REFERENCE-VALUE>
          <DEFINITION-REF DEST="ECUC-CHOICE-REFERENCE-DEF">/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef</DEFINITION-REF>
          <VALUE-REF DEST="ECUC-CONTAINER-VALUE">/ARRoot/BswM_2/BswMConfig_2/BswMArbitration_2/BswMModeCondition_3</VALUE-REF>
        </ECUC-REFERENCE-VALUE>
      </REFERENCE-VALUES>
    </ECUC-CONTAINER-VALUE>
</AUTOSAR>

2 个答案:

答案 0 :(得分:0)

似乎您的解析器和BeautifulSoup版本正在将标签转换为小写。

您应该这样做:

from bs4 import BeautifulSoup as Soup

def parseArxml():
    handler = open('input.arxml').read()
    soup = Soup(handler,"html.parser")
    for ecuc_container in soup.find_all('ecuc-container-value'):
        for def_ref in ecuc_container.find_all('definition-ref'):
            print(def_ref.get_text())

if __name__ == "__main__":
    parseArxml()

输出:

/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression
/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef
/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression
/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef

答案 1 :(得分:0)

您将看到print(soup),标记名已由解析器转换为小写。因此,在搜索标签名称时使用小写字母:

for ecuc_container in soup.findAll('ECUC-CONTAINER-VALUE'.lower()):

或简单地:

for ecuc_container in soup.findAll('ecuc-container-value'):

甚至更好:将文档显式解析为XML,以便不更改标签的大小写:

soup = Soup(handler,'xml')

以下是您获取<DEFINITION-REF DEST="ECUC-PARAM-CONF-CONTAINER-DEF">元素内的文本列表的方法:

def parseArxml():
    handler = open('input.arxml').read()
    soup = Soup(handler,'xml')
    dest = [d.text for d in soup.findAll('DEFINITION-REF') if d['DEST']=='ECUC-CHOICE-REFERENCE-DEF']   
    print(dest)

输出:

['/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef',
'/AUTOSAR/ecucdef/BswM/BswMConfig/BswMArbitration/BswMLogicalExpression/BswMArgumentRef']

或者,如果您希望获得所有definition-ref标签,而不论其属性如何,请使用

dest = [d.text for d in soup.findAll('definition-ref')]