我见过类似的问题,但没有任何真正对我有帮助。我需要读取文本文件,拆分它,并计算单词的长度。我也试图将它们打印在左边单词长度的表格中,然后是右边的实际单词。我的代码现在全部搞砸了,因为我已经到了我决定寻求帮助的地步。
a = open('owlcreek.txt').read().split()
lengths = dict()
for word in a:
length = len(word)
if length not in lengths:
for length, counter in lengths.items():
print "Words of length %d: %d" % (length, counter)
#words=[line for line in a]
#print ("\n" .join(counts))
另外我想我需要编写一个小解析器来获取所有"!--
。我尝试使用 The Counter ,但我想我不知道如何正确使用它。
答案 0 :(得分:3)
应该是这样的:
a=open('owlcreek.txt').read().split()
lengths=dict()
for word in a:
length = len(word)
# if the key is not present, add it
if not lengths.has_key(length):
# the value should be the list of words
lengths[length] = []
# append the word to the list for length key
lengths[length].append(word)
# print them out as length, count(words of that length)
for length, wrds in lengths.items():
print "Words of length %d: %d" % (length, len(wrds))
希望这有帮助!
答案 1 :(得分:0)
一个简单的正则表达式就足以清除标点符号和空格。
编辑:如果我正确理解您的问题,您需要文本文件中的所有唯一单词,按长度排序。在这种情况下:
import re
import itertools
with open('README.txt', 'r') as file:
words = set(re.findall(r"\w+'\w+|\w+", file.read())) # discard duplicates
sorted_words = sorted(words, key=len)
for length, words in itertools.groupby(sorted_words, len):
words = list(words)
print("Words of length {0}: {1}".format(length, len(words)))
for word in words:
print(word)