Question

我正在尝试将xml文件转换为csv文件，我尝试了bash脚本awk，xmlstarlet但没有运气，现在我在python中尝试这个但是仍然没有运气，下面是我的示例xml文件

<items><item>
<Name>demo title 1</Name>
<FileType>image</FileType>
<ReleaseDate>15 May 2015</ReleaseDate>
<Quality>
HDRiP</Quality>
<size>2848292</size>
<Rating>6.6</Rating>
<Genre>Comedy,
Music</Genre>
<Cast>rules bank demo,
anademo demo 2,
Hai demo 3,
Ale Demo 4</Cast>
<Languages>English</Languages>
<Subtitles>
hindi</Subtitles>
<FileName>demo title 1 fname</FileName>
<FileSize>1.4GB</FileSize>
<NoOfFiles>5</NoOfFiles>
<UploadTime>4 months</UploadTime>
<DateOfDataCapture>May 29, 2015</DateOfDataCapture>
<TimesDownloaded>2,339</TimesDownloaded>
<UpVotes>+742</UpVotes>
<DownVotes>-37</DownVotes>
<MediaType>[1080p, 720p, Blu-Ray, BDRip, HDRiP, DVD, DVDRip, x264, WEB-DL, Cam]</MediaType>
<Summary>this is demo pics
 collected for wallpapers only it is free available on many app and urls.

Written by

demo1.Cdemo324.78K

report summary</Summary>
</item><item>
<Name>demo title 2</Name>
<FileType>image</FileType>
<ReleaseDate>16 May 2015</ReleaseDate>
<Quality>
HDRiP</Quality>
<size>2855292</size>
<Rating>6.9</Rating>
<Genre>Comedy,
Music</Genre>
<Cast>rules bank demo,
anademo demo 12,
Hai demo 13,
Ale Demo 14</Cast>
<Languages>English</Languages>
<Subtitles>
hindi</Subtitles>
<FileName>demo title 2 fname</FileName>
<FileSize>1.3GB</FileSize>
<NoOfFiles>5</NoOfFiles>
<UploadTime>4 months</UploadTime>
<DateOfDataCapture>May 29, 2015</DateOfDataCapture>
<TimesDownloaded>2,339</TimesDownloaded>
<UpVotes>+742</UpVotes>
<DownVotes>-37</DownVotes>
<MediaType>[1080p, 720p, Blu-Ray, BDRip, HDRiP, DVD, DVDRip, x264, WEB-DL, Cam]</MediaType>
<Summary>this is demo pics 2
 collected for wallpapers only it is free available on many app and urls.

Written by

demo2.C2demo324.78K

report summary</Summary>
</item>
</items>

i want convert into csv file  and each <item> records should be in same line ,

when i am trying to use xml parser , it is converted records into csv file but issue is my tag values in multiple line and also contain new line character so it is converted csv in same way like 
below is sample csv file converted.
demo title 1,image,15 May 2015,
HDRiP,
2848292,6.6,Comedy,
Music,rules bank demo,
anademo demo 2,
Hai demo 3,
Ale Demo 4,English

i want it new line character should be replace  by space so all records of single items saved in one row in csv file .

我也试过python xml解析器xml2csv但是没有运气，请说明如何读取xml文件并删除这些不需要的新行字符。

Answer 1

尝试这样：

     import csv
     from lxml import etree

     # in:  xml with trader joe's locations
      # out:  csv with trader joe's locations

      out = raw_input("Name for output file:  ")
    if out.strip() is "":
   out = "trader-joes-all-locations.csv"

  out_data = []

    # use recover=True to ignore errors in the XML
          # examples of errors in this XML:
        #   missing "<" in opening tag:
          #   fax></fax>
       # missing "</" in closing tag:
          #   <uid>1429860810uid>
          # 
        # also ignore blank text
     parser = etree.XMLParser(recover=True, remove_blank_text=True)

      # xml on disk...could also pass etree.parse a URL
       file_name = "trader-joes-all-locations.xml"

        # use lxml to read and parse xml
          root = etree.parse(file_name, parser)

       # element names with data to keep
        tag_list = [ "name", "address1", "address2", "beer", "city",                     "comingsoon", "hours", "latitude", "longitude", "phone", "postalcode", "spirits", "state", "wine" ]

  # add field names by copying tag_list
     out_data.append(tag_list[:])

    def missing_location(p):
   lat = p.find("latitude")
  lon = p.find("longitude")
 if lat is None or lon is None:
return True
else:
   return False

        # pull info out of each poi node
  def get_poi_info(p):
     # if latitude or longitude doesn't exist, skip
       if missing_location(p):
         print "tMissing location for %s" % p.find("name").text
return None
   info = []
   for tag in tag_list:
   # if tag == "name":
   #   print "%s" % p.find(tag).text
   node = p.find(tag)
   if node is not None and node.text:
   if tag == "latitude" or tag == "longitude":
    info.append(round(float(node.text), 5))
  else:
    info.append(node.text.encode("utf-8"))
    # info.append(node.text.encode("ascii", "ignore"))
else:
  info.append("")
 return info

 print "nreading xml..."

 # get all <poi> elements
 pois = root.findall(".//poi")
   for p in pois:
  poi_info = get_poi_info(p)
 # print "%s" % (poiInfo)
 if poi_info:
out_data.append(poi_info)

print "finished xml, writing file..."

out_file  = open(out, "wb")
csv_writer = csv.writer(out_file, quoting=csv.QUOTE_MINIMAL)
  for row in out_data:
 csv_writer.writerow(row)

 out_file.close()

print "wrote %sn" % out

使用python

1 个答案: