我正在尝试将xml文件转换为csv文件,我尝试了bash脚本awk,xmlstarlet但没有运气,现在我在python中尝试这个但是仍然没有运气, 下面是我的示例xml文件
<items><item>
<Name>demo title 1</Name>
<FileType>image</FileType>
<ReleaseDate>15 May 2015</ReleaseDate>
<Quality>
HDRiP</Quality>
<size>2848292</size>
<Rating>6.6</Rating>
<Genre>Comedy,
Music</Genre>
<Cast>rules bank demo,
anademo demo 2,
Hai demo 3,
Ale Demo 4</Cast>
<Languages>English</Languages>
<Subtitles>
hindi</Subtitles>
<FileName>demo title 1 fname</FileName>
<FileSize>1.4GB</FileSize>
<NoOfFiles>5</NoOfFiles>
<UploadTime>4 months</UploadTime>
<DateOfDataCapture>May 29, 2015</DateOfDataCapture>
<TimesDownloaded>2,339</TimesDownloaded>
<UpVotes>+742</UpVotes>
<DownVotes>-37</DownVotes>
<MediaType>[1080p, 720p, Blu-Ray, BDRip, HDRiP, DVD, DVDRip, x264, WEB-DL, Cam]</MediaType>
<Summary>this is demo pics
collected for wallpapers only it is free available on many app and urls.
Written by
demo1.Cdemo324.78K
report summary</Summary>
</item><item>
<Name>demo title 2</Name>
<FileType>image</FileType>
<ReleaseDate>16 May 2015</ReleaseDate>
<Quality>
HDRiP</Quality>
<size>2855292</size>
<Rating>6.9</Rating>
<Genre>Comedy,
Music</Genre>
<Cast>rules bank demo,
anademo demo 12,
Hai demo 13,
Ale Demo 14</Cast>
<Languages>English</Languages>
<Subtitles>
hindi</Subtitles>
<FileName>demo title 2 fname</FileName>
<FileSize>1.3GB</FileSize>
<NoOfFiles>5</NoOfFiles>
<UploadTime>4 months</UploadTime>
<DateOfDataCapture>May 29, 2015</DateOfDataCapture>
<TimesDownloaded>2,339</TimesDownloaded>
<UpVotes>+742</UpVotes>
<DownVotes>-37</DownVotes>
<MediaType>[1080p, 720p, Blu-Ray, BDRip, HDRiP, DVD, DVDRip, x264, WEB-DL, Cam]</MediaType>
<Summary>this is demo pics 2
collected for wallpapers only it is free available on many app and urls.
Written by
demo2.C2demo324.78K
report summary</Summary>
</item>
</items>
i want convert into csv file and each <item> records should be in same line ,
when i am trying to use xml parser , it is converted records into csv file but issue is my tag values in multiple line and also contain new line character so it is converted csv in same way like
below is sample csv file converted.
demo title 1,image,15 May 2015,
HDRiP,
2848292,6.6,Comedy,
Music,rules bank demo,
anademo demo 2,
Hai demo 3,
Ale Demo 4,English
i want it new line character should be replace by space so all records of single items saved in one row in csv file .
我也试过python xml解析器xml2csv但是没有运气,请说明如何读取xml文件并删除这些不需要的新行字符。
答案 0 :(得分:0)
尝试这样:
import csv
from lxml import etree
# in: xml with trader joe's locations
# out: csv with trader joe's locations
out = raw_input("Name for output file: ")
if out.strip() is "":
out = "trader-joes-all-locations.csv"
out_data = []
# use recover=True to ignore errors in the XML
# examples of errors in this XML:
# missing "<" in opening tag:
# fax></fax>
# missing "</" in closing tag:
# <uid>1429860810uid>
#
# also ignore blank text
parser = etree.XMLParser(recover=True, remove_blank_text=True)
# xml on disk...could also pass etree.parse a URL
file_name = "trader-joes-all-locations.xml"
# use lxml to read and parse xml
root = etree.parse(file_name, parser)
# element names with data to keep
tag_list = [ "name", "address1", "address2", "beer", "city", "comingsoon", "hours", "latitude", "longitude", "phone", "postalcode", "spirits", "state", "wine" ]
# add field names by copying tag_list
out_data.append(tag_list[:])
def missing_location(p):
lat = p.find("latitude")
lon = p.find("longitude")
if lat is None or lon is None:
return True
else:
return False
# pull info out of each poi node
def get_poi_info(p):
# if latitude or longitude doesn't exist, skip
if missing_location(p):
print "tMissing location for %s" % p.find("name").text
return None
info = []
for tag in tag_list:
# if tag == "name":
# print "%s" % p.find(tag).text
node = p.find(tag)
if node is not None and node.text:
if tag == "latitude" or tag == "longitude":
info.append(round(float(node.text), 5))
else:
info.append(node.text.encode("utf-8"))
# info.append(node.text.encode("ascii", "ignore"))
else:
info.append("")
return info
print "nreading xml..."
# get all <poi> elements
pois = root.findall(".//poi")
for p in pois:
poi_info = get_poi_info(p)
# print "%s" % (poiInfo)
if poi_info:
out_data.append(poi_info)
print "finished xml, writing file..."
out_file = open(out, "wb")
csv_writer = csv.writer(out_file, quoting=csv.QUOTE_MINIMAL)
for row in out_data:
csv_writer.writerow(row)
out_file.close()
print "wrote %sn" % out