我需要阅读一个文本文件,去除不必要的标点符号,将单词小写,并使用二进制搜索树功能来制作由文件中的单词组成的单词二进制搜索树。
要求我们计算重复出现的单词的频率,并要求总单词数和总唯一单词数。
到目前为止,我已经解决了标点符号,完成了文件读取,完成了小写字母,基本完成了二进制搜索树的工作,我只需要弄清楚如何在代码中实现“频率”计数器即可。
我的代码如下:
class BSearchTree :
class _Node :
def __init__(self, word, left = None, right = None) :
self._word = word
self._count = 0
self._left = left
self._right = right
def __init__(self) :
self._root = None
self._wordc = 0
self._each = 0
def isEmpty(self) :
return self._root == None
def search(self, word) :
probe = self._root
while (probe != None) :
if word == probe._word :
return probe
if word < probe._value :
probe = probe._left
else :
probe = probe._right
return None
def insert(self, word) :
if self.isEmpty() :
self._root = self._Node(word)
self._root._freq += 1 <- is this correct?
return
parent = None #to keep track of parent
#we need above information to adjust
#link of parent of new node later
probe = self._root
while (probe != None) :
if word < probe._word : # go to left tree
parent = probe # before we go to child, save parent
probe = probe._left
elif word > probe._word : # go to right tree
parent = probe # before we go to child, save parent
probe = probe._right
if (word < parent._word) : #new value will be new left child
parent._left = self._Node(word)
else : #new value will be new right child
parent._right = self._Node(word)
原因是格式化使我丧命,这是它的后半部分。
class NotPresent(Exception) :
pass
def main():
t=BST()
file = open("sample.txt")
line = file.readline()
file.close()
#for word in line:
# t.insert(word)
# Line above crashes program because there are too many
# words to add. Lines on bottom tests BST class
t.insert('all')
t.insert('high')
t.insert('fly')
t.insert('can')
t.insert('boars')
#t.insert('all') <- how do i handle duplicates by making
t.inOrder() #extras add to the nodes frequency?
感谢您的帮助/尝试提供帮助!
答案 0 :(得分:0)
首先,将Node
的{{1}}初始化为1优于在_freq
的{{1}}中进行初始化
(另外1个:在python编码约定中,不建议在写入默认参数值时使用空格。)
BST
,然后添加最后3行:
insert()