Question

我要转换以下格式的文本文档文件夹：

texts = ['text of document 1', 'text of document 2', 'text of document 3',...]

以应用文本挖掘方法。

到目前为止，我的代码如下：

import os
file= "*.txt"
path = "C:\\"
texts=[]

for files in os.listdir(path):
     with open(path + files) as f:
         for x in f:
             texts.append(x)

不幸的是，结果与想要的结果不同：

texts = ['line 1 of document 1', 'line 2 of document 1', …]

我做错了什么？有人可以建议对我的代码进行改进吗？

Answer 1

B（或者您的情况，for line in file:）遍历文件中的各行。

请改用for x in f:方法。这样会将整个文件读入字符串：

.read()

编辑：我刚刚看到您对空条目的评论。如果您的目录包含空文件，则可以防止添加它们：

for files in os.listdir(path):
     with open(path + files) as f:
         texts.append(f.read())

在python中读取特殊格式的文本文件

1 个答案: