我在一个文件夹中收集了20个文本文件,我正在尝试为其创建字典并将该字典输出到文本文件。
我通过输入文件名创建了一个适用于目录中单个文件的代码。但是,它不允许我一次输入多个文本文件,如果我单独运行每个文本文件,它们只会相互覆盖。我尝试将文件输入转换为使用import os并从cwd中读取,但是我遇到了变量错误,只是不确定自己在做什么错。
fname = input ('Enter File: ')
hand = open(fname)
di = dict()
for lin in hand:
lin = lin.rstrip()
wds = lin.split()
for w in wds:
di[w] = di.get(w,0) + 1
print(di)
largest = -1
theword = None
for k,v in di.items() :
if v > largest :
largest = v
theword = k
print(theword,largest)
f = open("output.txt", "w")
f.write(str(di))
f.close()
我尝试添加
import os
for filename in os.listdir(os.getcwd()):
fname = ('*.txt')
hand = open(fname)
在顶部,但我犯了一个错误,因为它无法识别我认为将fname分配为正在读取的文件的通配符。
答案 0 :(得分:0)
如果要使用通配符,则需要glob
模块。但是对于您来说,这听起来就像您只想将所有文件放在一个目录中,所以:
for filename in os.listdir('.'): # . is cwd
hand = open(filename)
答案 1 :(得分:0)
您可以遍历目录中的每个.txt文件,并将这些文本文件的内容打印或存储在字典或变量中。
import os
for filename in os.listdir(os.getcwd()):
name, file_extension = os.path.splitext(filename)
if '.txt' in file_extension:
hand = open(filename)
for line in hand:
print line
答案 2 :(得分:0)
import glob
# a list of all txt file in the current dir
files = glob.glob("*.txt")
# the dictionary that will hold the file names (key) and content (value)
dic = {}
# loop to opend files
for file in files:
with open(file, 'r', encoding='utf-8') as read:
# the key will hold the name the value the content
dic[file] = read.read()
# For each file we will append the name and the content in output.txt
with open("output.txt", "a", encoding = 'utf-8') as output:
output.write(dic[file] + "\n" + read.read() + "\n\n")
答案 3 :(得分:0)
如果您使用的是Python 3.4或更高版本,则可以使用pathlib.Path()
和collections.Counter()
来简化代码:
from pathlib import Path
from collections import Counter
counter = Counter()
dir = Path('dir')
out_file = Path('output.txt')
for file in dir.glob('*.txt'):
with file.open('r', encoding='utf-8') as f:
for l in f:
counter.update(l.strip().split())
counter.most_common(10)
with out_file.open('w', encoding='utf-8') as f:
f.write(counter)
如果您使用的是Python 3.5或更高版本,则该代码可以更加简单:
from pathlib import Path
from collections import Counter
counter = Counter()
dir = Path('dir')
out_file = Path('output.txt')
for file in dir.glob('*.txt'):
counter.update(file.read_text(encoding='utf-8').split())
counter.most_common(10)
out_file.write_text(counter, encoding='utf-8')
这是示例输出:
>>> from pathlib import Path
>>> from collections import Counter
>>> counter = Counter()
>>> file = Path('t.txt')
>>> file.is_file()
True
>>> with file.open('r', encoding='utf-8') as f:
... for l in f:
... counter.update(l.strip().split())
...
>>> counter.most_common(5)
[('is', 10), ('better', 8), ('than', 8), ('to', 5), ('the', 5)]
>>>