Question

我正在尝试使用python从一个zip文件中提取xml文件。我想提取每个xml的内容并将其保存在某些python结构中。

我正在构建文本分类器，我希望能够单独查看文件，以便我可以对其进行处理。

我正在使用Jupyter笔记本，因此zip文件与我的代码文件位于同一位置。

我尝试了以下

#unizipping the file
data = zipfile.ZipFile('data.zip', 'r')

#iterating through the names of the files to open them
for name in data.namelist():
#open the xml files and save them to a txt file
       with data.open(name) as inputFile, open('tweet.txt','w') as outputFile:
                for line in inputFile:
                    newLine=line.strip()
                    outputFile.write(newLine)

但是我收到一条错误消息

TypeError: write() argument must be str, not bytes

xml文件示例

<author lang="en">
    <documents>
        <document><![CDATA[@Michael_J_Parry can't comment on the transphobic but the feminist girl in me is a bit gut reaction offended]]></document>
        <document><![CDATA[@NZStuff and to make it worse you put a pic of the sisters holding trophies onto the hotel story... hoping this was accident, guessing not]]></document>
        <document><![CDATA[@NZStuff Serena wins 23rd title &amp; you lead with a hotel bill Tennis Auck has no qualms with, burying the story of her record breaking win]]></document>
        <document><![CDATA[Tietjens is now coaching Samoa!! I might be a little slow to pick up on this, sorry. But how awesome is that.]]></document>
    </documents>
</author>

任何有关如何解决此问题的技巧都是有用的。谢谢

从Python中的zip提取xml文件

0 个答案: