如何遍历多个xml文件以解析它们并将其发送到REST API

时间:2019-07-23 22:04:54

标签: xml python-3.x file xml-parsing elementtree

import os
from xml.etree import ElementTree as ET
# files are in a sub folder where this script is being ran
path = "attachments"
for filename in os.listdir(path):
    # Only get xml files
    if not filename.endswith('.xml'): continue
    # I haven't been able to get it to work by just say
    fullname = os.path.join(path, filename)
    # This joins the path for each file it files so that python knows t
    tree = ET.parse(fullname)
    # Parse the files..
    print(tree)
    # Get the root of the XML tree structure
    root = tree.getroot()
    # Print the tags it finds from all the child elements from root
for child in root:
    print(child.tag, child.text)

我已建立REST消息。我已经弄清楚了如何解析单个文件的XML。

我不知道如何解析目录中的多个文件。换句话说,我要遍历每个文件,解析XML,并将XML中的某些元素传递到REST发布消息中。

在过去两天里,我尝试了所有可以在互联网上找到的内容,并进行了搜索。似乎没有任何效果,或者我做错了...:\

在我的评论中,我解释了我认为正在发生的事情。您可以在我给它的路径中看到我在说文件名的位置,解析xml文件并打印标签/文本。在编写时,它确认我确实有六个对象(即有多少个对象) * .xml文件在该DIR中)。然后它将所有元素和文本打印为一个,实际上是中间文件(第4个文件)。

这是我看到的确切输出,减去一些敏感数据。

<xml.etree.ElementTree.ElementTree object at 0x0000018EF60E7608>
<xml.etree.ElementTree.ElementTree object at 0x0000018EF60DFE08>
<xml.etree.ElementTree.ElementTree object at 0x0000018EF62B1B08>
<xml.etree.ElementTree.ElementTree object at 0x0000018EF62E3F48>
<xml.etree.ElementTree.ElementTree object at 0x0000018EF629B608>
<xml.etree.ElementTree.ElementTree object at 0x0000018EF62B6988>

NUMBER 20514218
PARENT

STATUS 1-CLOSED
OPEN_DATE 12/01/2017 00:34:35
CLOSE_DATE 12/05/2017 17:48:28
SOURCE Self Service
PROCESS HR INTERNAL REQUEST FORM
CATEGORY HR Connect
SUB_CATEGORY Personnel Action Change/Update
USER_ID *sensitive information*
LAST_NAME *sensitive information*
FIRST_NAME Brandon
SITUATION SELECT...
PRIORITY 5 Days
ADVISOR_NAME ROMAN *sensitive information*
TEAM *sensitive information*
NEXT_ACTION

PROCESS_STATUS Verified
TRANSFERT_DATE

DEADLINE 12/12/2017 17:18:03
QUEUE HR Internal Request
FROZEN_DATE

OTHER_EMPLOYEE_ID *sensitive information*
REQUEST *sensitive information*

HISTORY_RESPONSE *sensitive information*
FINAL_RESPONSE *sensitive information*

-------------------here's the raw XML----------------------
<?xml version="1.0" encoding="UTF-8"?>
<CASE>
  <NUMBER>20514218</NUMBER>
  <PARENT>
  </PARENT>
  <STATUS>1-CLOSED</STATUS>
  <OPEN_DATE>12/01/2017 00:34:35</OPEN_DATE>
  <CLOSE_DATE>12/05/2017 17:48:28</CLOSE_DATE>
  <SOURCE>Self Service</SOURCE>
  <PROCESS>HR INTERNAL REQUEST FORM</PROCESS>
  <CATEGORY>HR Connect</CATEGORY>
  <SUB_CATEGORY>Personnel Action Change/Update</SUB_CATEGORY>
  <USER_ID>*sensitive information*</USER_ID>
  <LAST_NAME>*sensitive information*</LAST_NAME>
  <FIRST_NAME>*sensitive information*</FIRST_NAME>
  <SITUATION>SELECT...</SITUATION>
  <PRIORITY>5 Days</PRIORITY>
  <ADVISOR_NAME>ROMAN *sensitive information*</ADVISOR_NAME>
  <TEAM>2 HR SRV CNTR PA</TEAM>
  <NEXT_ACTION>
  </NEXT_ACTION>
  <PROCESS_STATUS>Verified</PROCESS_STATUS>
  <TRANSFERT_DATE>
  </TRANSFERT_DATE>
  <DEADLINE>12/12/2017 17:18:03</DEADLINE>
  <QUEUE>HR Internal Request</QUEUE>
  <FROZEN_DATE>
  </FROZEN_DATE>
  <OTHER_EMPLOYEE_ID>*sensitive information*</OTHER_EMPLOYEE_ID>
  <REQUEST>*sensitive information*</REQUEST>
  <HISTORY_RESPONSE>*sensitive information*</HISTORY_RESPONSE>
  <FINAL_RESPONSE>*sensitive information*</FINAL_RESPONSE>
</CASE>

1 个答案:

答案 0 :(得分:1)

import os
from xml.etree import ElementTree as ET
# files are in a sub folder where this script is being ran
path = "attachments"
for filename in os.listdir(path):
    # Only get xml files
    if not filename.endswith('.xml'): continue
    # I haven't been able to get it to work by just saying 'if filename.endswith('.xml')' only if not..
    fullname = os.path.join(path, filename)
    # This joins the path for each file it files so that python knows the full path / filename to trigger parser
    tree = ET.parse(fullname)
    # Parse the files..
    print(tree)
    # Get the root of the XML tree structure
    root = tree.getroot()
    # Print the tags it finds from all the child elements from root
    for child in root:
        print(child.tag, child.text)

缩进是错的,感谢杰克·弗莱汀。