所以我有这个凌乱的代码,我想从frankenstein.txt获取每个单词,按字母顺序排序,删除一个和两个字母单词,并将它们写入一个新文件。
def Dictionary():
d = []
count = 0
bad_char = '~!@#$%^&*()_+{}|:"<>?\`1234567890-=[]\;\',./ '
replace = ' '*len(bad_char)
table = str.maketrans(bad_char, replace)
infile = open('frankenstein.txt', 'r')
for line in infile:
line = line.translate(table)
for word in line.split():
if len(word) > 2:
d.append(word)
count += 1
infile.close()
file = open('dictionary.txt', 'w')
file.write(str(set(d)))
file.close()
Dictionary()
如何简化它并使其更具可读性,以及如何在新文件中将文字垂直写入(它在水平列表中写入):
abbey
abhorred
about
etc....
答案 0 :(得分:0)
以下几项改进:
from string import digits, punctuation
def create_dictionary():
words = set()
bad_char = digits + punctuation + '...' # may need more characters
replace = ' ' * len(bad_char)
table = str.maketrans(bad_char, replace)
with open('frankenstein.txt') as infile:
for line in infile:
line = line.strip().translate(table)
for word in line.split():
if len(word) > 2:
words.add(word)
with open('dictionary.txt', 'w') as outfile:
outfile.writelines(sorted(words)) # note 'lines'
一些注意事项:
string
包含可用于提供“不良字符”的常量; count
(无论如何只是len(d)
); with
上下文管理器进行文件处理;和set
可以防止重复,但不会对它们进行排序(因此sorted
)。 答案 1 :(得分:0)
使用 re 模块。
import re
words = set()
with open('frankenstein.txt') as infile:
for line in infile:
words.extend([x for x in re.split(r'[^A-Za-z]*', line) if len(x) > 2])
with open('dictionary.txt', 'w') as outfile:
outfile.writelines(sorted(words))
从 re.split 中的 r'[^ A-Za-z] *',将'A-Za-z'替换为您想要的字符包含在dictionary.txt中。