我有以下给出的格式的xml文档,我找不到使用python将其转换为csv的成功方法。我正在使用Spyder IDE
并且非常业余python-ista
。我设法为其中一个文件使用在线转换器,但其余文件太大而无法上传。
我正在寻找输出为rowID, PostID, Score, Text
。
请有人帮忙吗?
<?xml version="1.0" encoding="utf-8"?>
<comments>
<row Id="1" PostId="1" Score="5" Text="Was there something in particular you didn't understand in the wikipedia article? http://en.wikipedia.org/wiki/Spin_%28physics%29" CreationDate="2010-11-02T19:11:07.043" UserId="42" />
<row Id="2" PostId="3" Score="1" Text="I thought the wikipedia article here was pretty good, but maybe it only makes sense if you have a little quantum mechanics background: http://en.wikipedia.org/wiki/Particle_physics_and_representation_theory Were you able to get anything out of it?" CreationDate="2010-11-02T19:13:34.870" UserId="42" />
<row Id="3" PostId="3" Score="0" Text="i mostly thought this was a better place for the question than MO." CreationDate="2010-11-02T19:16:09.873" UserId="40" />
<row Id="6" PostId="4" Score="11" Text="An accurate answer, but if the poster doesn't understand the actual concept of spin (not to mention group theory), this is all but useless." CreationDate="2010-11-02T19:32:15.410" UserId="13" />
<row Id="7" PostId="2" Score="2" Text="I'm tempted to answer: with much difficulty, in a highly qualitative way, and only by reading a fair-sized book. There are many decent pop-sci books on string theory; I can't remember the names of any I read, but I'm sure someone can recommend one or two." CreationDate="2010-11-02T19:36:53.290" UserId="13" />
<row Id="8" PostId="8" Score="0" Text="so the fundamental particle is acting on the quantum states?" CreationDate="2010-11-02T19:36:55.263" UserId="40" />
其次,如果某些行没有所有字段或有额外字段,我如何忽略这些字段并仅填充指定字段的内容?我收到以下错误消息,但不想要额外的3列?
ParserError: Error tokenizing data. C error: Expected 4 fields in line 41, saw 7
答案 0 :(得分:1)
以下内容对我有用:
{{1}}
这也可以使用pandas并将数据框保存为CSV来完成,但我保持简单。
将在与XML文件相同的文件夹中生成具有相同名称但以_out.csv结尾的文件。