我正在尝试使用python从一个zip文件中提取xml文件。我想提取每个xml的内容并将其保存在某些python结构中。
我正在构建文本分类器,我希望能够单独查看文件,以便我可以对其进行处理。
我正在使用Jupyter笔记本,因此zip文件与我的代码文件位于同一位置。
我尝试了以下
#unizipping the file
data = zipfile.ZipFile('data.zip', 'r')
#iterating through the names of the files to open them
for name in data.namelist():
#open the xml files and save them to a txt file
with data.open(name) as inputFile, open('tweet.txt','w') as outputFile:
for line in inputFile:
newLine=line.strip()
outputFile.write(newLine)
但是我收到一条错误消息
TypeError: write() argument must be str, not bytes
xml文件示例
<author lang="en">
<documents>
<document><![CDATA[@Michael_J_Parry can't comment on the transphobic but the feminist girl in me is a bit gut reaction offended]]></document>
<document><![CDATA[@NZStuff and to make it worse you put a pic of the sisters holding trophies onto the hotel story... hoping this was accident, guessing not]]></document>
<document><![CDATA[@NZStuff Serena wins 23rd title & you lead with a hotel bill Tennis Auck has no qualms with, burying the story of her record breaking win]]></document>
<document><![CDATA[Tietjens is now coaching Samoa!! I might be a little slow to pick up on this, sorry. But how awesome is that.]]></document>
</documents>
</author>
任何有关如何解决此问题的技巧都是有用的。谢谢