使用python multirow进行XML解析

时间:2018-10-16 19:05:35

标签: python xml minidom

我无法解析这种xml文件:

<items>
  <item>
   <name>Car</name>
   <description>
      <specification>
          <color>blue</color>
      </specification>
      <specification>
          <color>yellow</color>
      </specification>
   </description>
  <item>
 <items>

我希望恢复所有用逗号分隔的颜色。

我是python的初学者。

items = doc.getElementsByTagName("items")
for item in items:
   name = item.getAttribute("name")
   color = item.getElementByTagName("color")[0]
   print(name,color.firstChild.data)

谢谢。

2 个答案:

答案 0 :(得分:0)

我会推荐BeautifulSoup

from bs4 import BeautifulSoup
a='''<items>
  <item>
   <name>Car</name>
   <description>
      <specification>
          <color>blue</color>
      </specification>
      <specification>
          <color>yellow</color>
      </specification>
   </description>
  <item>
 <items>'''
color_list=[]
soup = BeautifulSoup(a, "html.parser")
for i in soup.findAll('color'):
    color_list.append(i.next_element)
print(','.join(color_list)) # blue,yellow

答案 1 :(得分:0)

谢谢!它适用于这种情况,但是对于较大的示例,我无法做到..

<TradeMark>
   <MarkImageDetails>
      <MarkImage>
         <MarkImageFilename>FMARK0000000004393852</MarkImageFilename>
         <MarkImageFileFormat>TIFF</MarkImageFileFormat>
      </MarkImage>
   </MarkImageDetails>
   <GoodsServicesDetails>
      <GoodsServices>
         <ClassificationKindCode>Nice</ClassificationKindCode>
         <ClassDescriptionDetails>
            <ClassDescription>
               <ClassNumber>35</ClassNumber>
            </ClassDescription>
            <ClassDescription>
               <ClassNumber>41</ClassNumber>
            </ClassDescription>
            <ClassDescription>
               <ClassNumber>42</ClassNumber>
            </ClassDescription>
         </ClassDescriptionDetails>
      </GoodsServices>
   </GoodsServicesDetails>
</TradeMark>

我希望使用ClassNumber。