我有以下XML结构,我试图将它转换为python中的csv:
from xml.etree import ElementTree
import csv
my_list = []
with open('/Users/testuser/Desktop/CMEREG1.XML', 'rt') as f:
tree = ElementTree.parse(f)
for node in tree.iter('TrdCaptRpt'):
RptID = node.attrib.get('RptID')
TrdTyp = node.attrib.get('TrdTyp')
TrdSubTyp = node.attrib.get('TrdSubTyp')
TrdDt = node.attrib.get('TrdDt')
BizDt = node.attrib.get('BizDt')
MLegRptTyp = node.attrib.get('MLegRptTyp')
MtchStat = node.attrib.get('MtchStat')
MsgEvtSrc = node.attrib.get('MsgEvtSrc')
TrdID = node.attrib.get('TrdID')
LastQty = node.attrib.get('LastQty')
LastPx = node.attrib.get('LastPx')
TxnTm = node.attrib.get('TxnTm')
SettlCcy = node.attrib.get('SettlCcy')
SettlDt = node.attrib.get('SettlDt')
PxSubTyp = node.attrib.get('PxSubTyp')
VenueTyp = node.attrib.get('VenueTyp')
VenuTyp = node.attrib.get('VenuTyp')
OfstInst = node.attrib.get('OfstInst')
my_list.append[node.attrib.get('RptID')]
print RptID, TrdTyp, TrdSubTyp, TrdDt, BizDt, MLegRptTyp, MtchStat, MsgEvtSrc, TrdID, LastQty, LastPx, TxnTm, SettlCcy, SettlDt, PxSubTyp, VenueTyp, VenuTyp, OfstInst
with open('/Users/anantsangar/Desktop/output.csv', 'w') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(my_list)
我正在尝试将其转换为csv文件。我已尝试使用以下代码,但我可以;得到正确的输出:
$GLOBALS['gameTrailer'] = $game_json[$trimmed]['data']['movies'][0]['webm']['max'];
echo json_encode(array(
'gameTrailer' => $GLOBALS['gameTrailer'],
//+ other variables
));
我无法将每个标记都放入csv中。有没有简单的方法将其导出为CSV?
谢谢
答案 0 :(得分:1)
csv.DictWriter
,从node.attrib
词典名为TrdCapRpt
的元素具有属性,如果您有此类节点,则其属性为node.attrib
为每个属性保存一个包含键/值的字典。
csv.DictWriter
允许编写从字典中获取的数据。
首先是一些导入(我总是使用lxml
,因为它非常快并且提供了额外的功能):
from lxml import etree
import csv
配置要在每条记录中使用的文件名和字段:
xml_fname = "data.xml"
csv_fname = "data.csv"
fields = [
"RptID", "TrdTyp", "TrdSubTyp", "ExecID", "TrdDt", "BizDt", "MLegRptTyp",
"MtchStat" "MsgEvtSrc", "TrdID", "LastQty", "LastPx", "TxnTm", "SettlCcy",
"SettlDt", "PxSubTyp", "VenueTyp", "VenuTyp", "OfstInst"]
阅读XML:
xml = etree.parse(xml_fname)
迭代元素“TrdCapRpt”,将属性值写入CSV文件:
with open(csv_fname, "w") as f:
writer = csv.DictWriter(f, fields, delimiter=";", extrasaction="ignore")
writer.writeheader()
for node in xml.iter("TrdCaptRpt"):
writer.writerow(node.attrib)
如果您更喜欢使用stdlib xml.etree.ElementTree
,则应该像现在一样轻松管理,因为node.attrib
也存在。
在您的评论中,您注意到您要从更多内容中导出属性
元素名称。这也是可能的。为此,我将示例修改为
使用xpath
(可能仅适用于lxml
)并添加额外的列
"elm_name"
用于跟踪创建记录的元素:
fields = [
"elm_name",
"RptID", "TrdTyp", "TrdSubTyp", "ExecID", "TrdDt", "BizDt", "MLegRptTyp",
"MtchStat" "MsgEvtSrc", "TrdID", "LastQty", "LastPx", "TxnTm", "SettlCcy",
"SettlDt", "PxSubTyp", "VenueTyp", "VenuTyp", "OfstInst",
"Typ", "Amt", "Ccy"
]
xml = etree.parse(xml_fname)
with open(csv_fname, "w") as f:
writer = csv.DictWriter(f, fields, delimiter=";", extrasaction="ignore")
writer.writeheader()
for node in xml.xpath("//*[self::TrdCaptRpt or self::PosRpt or self::Amt]"):
atts = node.attrib
atts["elm_name"] = node.tag
writer.writerow(node.attrib)
修改是:
fields
从其他元素中获取额外的"elm_name"
字段和字段(随意删除您不感兴趣的字段)。xml.xpath
迭代元素。 XPath表达式更复杂,所以我不确定,如果stdlib ElementTree支持它。 atts
字典中以提供元素的名称。警告:元素Amt
嵌套在PosRpt
和此树结构中
无法以CSV格式支持。记录是写的,但不成立
关于它们来自哪里的信息(除了记录之后)
父元素)。
答案 1 :(得分:0)
您应首先将包含所有标记的每一行都推送到列表中。
for node in tree.iter('TrdCaptRpt'):
.....
my_list.push([RptID, TrdTyp, TrdSubTyp, TrdDt, BizDt,
MLegRptTyp, MtchStat, MsgEvtSrc, TrdID,
LastQty, LastPx, TxnTm, SettlCcy, SettlDt,
PxSubTyp, VenueTyp, VenuTyp, OfstInst])
然后将每一行写入文件:
with open('/Users/anantsangar/Desktop/output.csv', 'w') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in my_list:
spamwriter.writerow(row)