我正在尝试编写一个显示文件索引的程序。它应按字母顺序输出唯一的单词及其频率。这就是我所拥有的,但它不起作用。提示?
仅供参考 - 我对计算机编程知之甚少!我正在上这门课来满足高中数学认可的要求。
f = open(raw_input("Enter a filename: "), "r")
myDict = {}
linenum = 0
for line in f:
line = line.strip()
line = line.lower()
line = line.split()
linenum += 1
for word in line:
word = word.strip()
word = word.lower()
if not word in myDict:
myDict[word] = []
myDict[word].append(linenum)
print "%-15s %-15s" %("Word", "Line Number")
for key in sorted(myDict):
print '%-15s: %-15d' % (key, myDict(key))
答案 0 :(得分:1)
您需要使用myDict [key]来获取字典。由于这是一个列表,你需要使用sum(myDict [key])来获取频率(count)
f = "HELLO HELLO HELLO WHAT ARE YOU DOING"
myDict = {}
linenum = 0
for word in f.split():
if not word in myDict:
myDict[word] = []
myDict[word].append(linenum)
print "%-15s %-15s" %("Word", "Frequency")
for key in sorted(myDict):
print '%-15s: %-15d' % (key, len(myDict[key]))
结果:
Word Frequency
ARE : 1
DOING : 1
HELLO : 3
WHAT : 1
YOU : 1
答案 1 :(得分:1)
你的缩进错了。第二个循环在第一个循环之外,所以它只在最后一行工作。 (你应该考虑使用4个空格来更好地看到它)。您的打印错误,而且您正在打印行号,而不是字数。
myDict = {}
linenum = 0
for line in f:
line = line.strip()
line = line.lower()
line = line.split()
linenum += 1
for word in line:
word = word.strip()
word = word.lower()
if not word in myDict:
myDict[word] = []
myDict[word].append(linenum)
print "%-15s %5s %s" %("Word", 'Count', "Line Numbers")
for key in sorted(myDict):
print '%-15s %5d: %s' % (key, len(myDict[key]), myDict[key])
示例输出:
Word Count Line Numbers
- 1: [6]
a 4: [2, 2, 3, 7]
about 1: [6]
alphabetical 1: [4]
编辑修正了代码中的错误
答案 2 :(得分:0)
这是我的一致性解决方案......
https://github.com/jrgosalia/Python/blob/master/problem2_concordance.py
$ python --version Python 3.5.1
def getLines(fileName):
""" getLines validates the given fileName.
Returns all lines present in a valid file. """
lines = ""
if (fileName != None and len(fileName) > 0 and os.path.exists(fileName)):
if os.path.isfile(fileName):
file = open(fileName, 'r')
lines = file.read()
if (len(lines) > 0):
return lines
else:
print("<" + fileName + "> is an empty file!", end="\n\n")
else:
print("<" + fileName + "> is not a file!", end="\n\n")
else:
print("<" + fileName + "> doesn't exists, try again!", end="\n\n")
return lines
from library import getLines
# List of English Punctuation Symbols
# Reference : Took maximum puntuations symbols possible from https://en.wikipedia.org/wiki/Punctuation_of_English
# NOTE: Apostrophe is excluded from the list as having it or not having it will give always distinct words.
punctuations = ["[", "]", "(", ")", "{", "}", "<", ">", \
":", ";", ",", "`", "'", "\"", "-", ".", \
"|", "\\", "?", "/", "!", "-", "_", "@", \
"\#", "$", "%", "^", "&", "*", "+", "~", "=" ]
def stripPunctuation(data):
""" Strip Punctuations from the given string. """
for punctuation in punctuations:
data = data.replace(punctuation, " ")
return data
def display(wordsDictionary):
""" Display sorted dictionary of words and their frequencies. """
noOfWords = 0
print("-" * 42)
print("| %20s | %15s |" % ("WORDS".center(20), "FREQUENCY".center(15)))
print("-" * 42)
for word in list(sorted(wordsDictionary.keys())):
noOfWords += 1
print("| %-20s | %15s |" % (word, str(wordsDictionary.get(word)).center(15)))
# Halt every 20 words (configurable)
if (noOfWords != 0 and noOfWords % 20 == 0):
print("\n" * 2)
input("PRESS ENTER TO CONTINUE ... ")
print("\n" * 5)
print("-" * 42)
print("| %20s | %15s |" % ("WORDS".center(20), "FREQUENCY".center(15)))
print("-" * 42)
print("-" * 42)
print("\n" * 2)
def prepareDictionary(words):
""" Prepare dictionary of words and count their occurences. """
wordsDictionary = {}
for word in words:
# Handle subsequent Occurences
if (wordsDictionary.get(word.lower(), None) != None):
# Search and add words by checking their lowercase version
wordsDictionary[word.lower()] = wordsDictionary.get(word.lower()) + 1
# Handle first Occurence
else:
wordsDictionary[word.lower()] = 1
return wordsDictionary
def main():
""" Main method """
print("\n" * 10)
print("Given a file name, program will find unique words and their occurences!", end="\n\n");
input("Press ENTER to start execution ... \n");
# To store all the words and their frequencies
wordsDictionary = {}
lines = ""
# Get valid input file
while (len(lines) == 0):
fileName = input("Enter the file name (RELATIVE ONLY and NOT ABSOLUTE): ")
print("\n\n" * 1)
lines = getLines(fileName)
# Get all words by removing all puntuations
words = stripPunctuation(lines).split()
# Prepare the words dictionary
wordsDictionary = prepareDictionary(words)
# Display words dictionary
display(wordsDictionary)
"""
Starting point
"""
main()
注意:您也需要library.py来执行上面的代码,它也存在于同一个github存储库中。
答案 3 :(得分:0)
为什么不使用Counter?这就是它的用途:
In [8]: s = 'How many times does each word show up in this sentence word word show up up'
In [9]: words = s.split()
In [10]: Counter(words)
Out[10]: Counter({'up': 3, 'word': 3, 'show': 2, 'times': 1, 'sentence': 1, 'many': 1, 'does': 1, 'How': 1, 'each': 1, 'in': 1, 'this': 1})
注意:我不能为这个特定的解决方案而受到赞誉。它直接来自Collections Module counter Python Bootcamp
答案 4 :(得分:0)
文本文件的一致性,按字母顺序;
f=input('Enter the input file name: ')
inputFile = open(f,"r")
list={}
for word in inputFile.read().split():
if word not in list:
list[word] = 1
else:
list[word] += 1
inputFile.close();
for i in sorted(list):
print("{0} {1} ".format(i, list[i]));