Question

我在一个文件夹中收集了20个文本文件，我正在尝试为其创建字典并将该字典输出到文本文件。

我通过输入文件名创建了一个适用于目录中单个文件的代码。但是，它不允许我一次输入多个文本文件，如果我单独运行每个文本文件，它们只会相互覆盖。我尝试将文件输入转换为使用import os并从cwd中读取，但是我遇到了变量错误，只是不确定自己在做什么错。

fname = input ('Enter File: ')
hand = open(fname)

di = dict()
for lin in hand:
    lin = lin.rstrip()
    wds = lin.split()
    for w in wds:


        di[w] = di.get(w,0) + 1

print(di)


largest = -1
theword = None
for k,v in di.items() : 
    if v > largest : 
        largest = v
        theword = k

print(theword,largest)

f = open("output.txt", "w")
f.write(str(di))
f.close()

我尝试添加

import os
for filename in os.listdir(os.getcwd()):
    fname = ('*.txt')
    hand = open(fname)

在顶部，但我犯了一个错误，因为它无法识别我认为将fname分配为正在读取的文件的通配符。

Answer 1

如果要使用通配符，则需要glob模块。但是对于您来说，这听起来就像您只想将所有文件放在一个目录中，所以：

for filename in os.listdir('.'): # . is cwd
    hand = open(filename)

Answer 2

您可以遍历目录中的每个.txt文件，并将这些文本文件的内容打印或存储在字典或变量中。

import os

for filename in os.listdir(os.getcwd()):
         name, file_extension = os.path.splitext(filename)
         if '.txt' in file_extension:
                hand = open(filename)
                for line in hand:
                    print line

Answer 3

import glob

# a list of all txt file in the current dir
files = glob.glob("*.txt")

# the dictionary that will hold the file names (key) and content (value)
dic = {}
# loop to opend files
for file in files:
    with open(file, 'r', encoding='utf-8') as read:
        # the key will hold the name the value the content
        dic[file] = read.read()
        # For each file we will append the name and the content in output.txt
        with open("output.txt", "a", encoding = 'utf-8') as output:
            output.write(dic[file] + "\n" + read.read() + "\n\n")

Answer 4

如果您使用的是Python 3.4或更高版本，则可以使用pathlib.Path()和collections.Counter()来简化代码：

from pathlib import Path
from collections import Counter

counter = Counter()
dir = Path('dir')
out_file = Path('output.txt')

for file in dir.glob('*.txt'):
    with file.open('r', encoding='utf-8') as f:
        for l in f:
            counter.update(l.strip().split())

counter.most_common(10)

with out_file.open('w', encoding='utf-8') as f:
    f.write(counter)

如果您使用的是Python 3.5或更高版本，则该代码可以更加简单：

from pathlib import Path
from collections import Counter

counter = Counter()
dir = Path('dir')
out_file = Path('output.txt')

for file in dir.glob('*.txt'):
    counter.update(file.read_text(encoding='utf-8').split())

counter.most_common(10)
out_file.write_text(counter, encoding='utf-8')

这是示例输出：

>>> from pathlib import Path
>>> from collections import Counter
>>> counter = Counter()
>>> file = Path('t.txt')
>>> file.is_file()
True
>>> with file.open('r', encoding='utf-8') as f:
...     for l in f:
...             counter.update(l.strip().split())
... 
>>> counter.most_common(5)
[('is', 10), ('better', 8), ('than', 8), ('to', 5), ('the', 5)]
>>>

如何使python读取字典目录中的所有文件？

4 个答案: