我正在编写一个程序,它从文本文件中读取单词并将所有单词放在链表中。该文件没有标点符号,只有单词。我还想将链表与预先加载的黑名单进行比较,该黑名单也是一个链表。
我所做的是我可以从文件加载链接列表,打印链接列表,检查大小,计算文件中出现的单词的频率,不打印低于指定的单词频率,我也能够将所有单词格式化为小写以便更好地处理。
我遇到的问题是将代码设置为正确,以便只打印出具有多个频率的单词。因此,如果单词“the”出现20次,我不希望它打印“< 1>”那么“< 2>”下次显示时,清除“< 20>”我只想打印一次“< 20>”
我发布了我的加载文件函数,打印函数和插入单词函数,class wordCloud()
的所有部分。
以下是代码:
void wordCloud::insertWord(string aWord){
wordNode *newWord = new wordNode(aWord);
//old code
if (head == NULL)
head = newWord;
else{
newWord->next = head;
head = newWord;
}
//revised code
//newWord->next = head;
//head = newWord;
size++;
}
void wordCloud::insertWordDistinct(string word){
for (wordNode *temp = head; temp != NULL; temp = temp->next){
if (word == temp->myWord){
temp->freq_count++;
//cout << temp->freq_count; //for debugging
}
}
insertWord(word);
}
void wordCloud::printWordCloud(int freq){
wordNode *temp, *previous;
int listSize = 0;
if (head == NULL) //determines if there are any words in the list
cout << "No Word Cloud" << endl;
else{
temp = head;
while (temp->next != NULL){ //prints each word until the list is NULL
if (temp->freq_count >= freq){
cout << temp->myWord << " <" << temp->freq_count << ">" << endl;
temp = temp->next;
listSize++;
}
else{
previous = temp;
temp = temp->next;
previous = NULL;
free(previous);
}
}
}
cout << "\nThere are " << size << " words in the file.\n"; //print file size - for debugging - works
cout << "\nThere are " << listSize << " words in the list\n\n"; //print list size - for debugging - works
system("pause");
}
void wordCloud::printBlacklist(){
wordNode *temp;
if (head == NULL) //determines if there is a list
cout << "No Words in the blacklist" << endl;
else{
temp = head;
while (temp != NULL){ //prints each word until the list is NULL
cout << temp->myWord << endl;
temp = temp->next;
}
}
cout << "\nThere are " << size << " words in the file.\n\n"; //print size - for debugging - works
system("pause");
}
void wordCloud::loadWordCloud(string fileName){
ifstream file; //variable for fileName
string word; //string to hold each word
file.open(fileName); //open file
if (!file) { //error handling
cout << "Error: Can't open the file. File may not exist.\n";
exit(1);
}
while (!file.eof()){
file >> word; //grab a word from the file one at a time
insertWordDistinct(changeToLowerCase(word));
//insertWord(word); //for debugging
//cout << word <<'\n'; //print word - for debugging
}
//printWordCloud(); //print word cloud - for debugging - works
file.close(); //always make sure to close file after read
}
void wordCloud::loadBlacklist(string fileName){
ifstream file; //variable for fileName
string bannedWord; //string to hold each word
file.open(fileName); //open file
if (!file) { //error handling if file does not load
cout << "Error: Can't open the file. File may not exist.\n";
exit(1);
}
while (!file.eof()){
file >> bannedWord; //grab a word from the file one at a time
if (bannedWord.empty()){ //error handling if file is empty
cout << "File is empty!!\n";
exit(1);
}
insertWord(changeToLowerCase(bannedWord));
//cout << bannedWord << '\n'; //print blacklist words - for debugging
}
//printBlacklist(); //print blacklist - for debugging - works
file.close(); //always make sure to close file after read
}
我注意到如果我在previous = NULL
之前放置free()
,那我的程序不会崩溃,并且我没有得到任何DLL内存处理错误。事实上,我可以完全取消free()
,似乎工作得很好。我只是不知道这是否是正确的方法。在我看来,如果我只是将一个节点指向NULL&lt;它不一定会删除内存中的数据。我不安地不使用free()
或delete()
来终止节点。如果我错了,请纠正我,或者请直接指出我。
差不多,这有什么问题:
wordNode *previous, *temp = head;
while (temp != NULL){
if (word == temp->myWord){
temp->freq_count++;
previous = temp;
temp = temp->next;
delete(previous);
}
}
我可能会发现这个错误,但基本上我只需要找到插入列表中的每个单词的频率,然后删除包含该单词的多个节点,直到只剩下频率计数最高的节点为止打印。我试图在insertWordDistinct(string word)
中执行此操作来完成此操作。只是不知道该怎么做。
答案 0 :(得分:2)
你的打印循环对你毫无帮助。它应该是最小频率上的简单枚举过滤。不应该发生删除,释放或其他内存管理。走一下这个清单:
void wordCloud::printWordCloud(int freq)
{
int listSize = 0;
int uniqSize = 0;
for (wordNode *temp = head; temp; temp = temp->next)
{
if (temp->freq_count >= freq)
{
cout << temp->myWord << " <" << temp->freq_count << ">" << endl;
listSize += temp->freq_count;
++uniqSize;
}
}
cout << "\nThere are " << size << " words in the file.\n";
cout << "\nThere are " << listSize << " words in the filtered list\n\n";
cout << "\nThere are " << uniqSize << " unique words in the filtered list\n\n";
system("pause");
}
这还应该让您回到正确管理wordCloud::~wordCloud()
析构函数中的列表,以再次正确删除节点。还有很多其他的东西我会做不同的,但它是一个学习过程所以我不会破坏你的派对。
<强>更新强>
根据OP的请求,下面是一个示例链表插入函数,插入在构建列表时进行排序。在适应这一点的同时,他发现了原始实施方式的重大差异和问题。希望它也可以帮助其他人。
void wordCloud::insert(const std::string& aWord, unsigned int freq)
{
// manufacture lower-case version of word;
std::string lcaseWord = make_lower(aWord);
// search for the word by walking a pointer-to-pointer
// through the pointers in the linked list.
wordNode** pp = &head;
while (*pp && ((*pp)->myWord < lcaseWord)
pp = &(*pp)->next;
if (*pp && !(lcaseWord < (*pp)->myWord))
{
(*pp)->freq_count++;
}
else
{ // insert the node
wordNode *node = new wordNode(lcaseWord);
node->freq_count = freq;
node->next = *pp;
*pp = node;
++size;
}
}
答案 1 :(得分:0)
我认为每个单词只打印一次,你必须创建一个唯一的列表,其中包含原始列表中的单词及其出现次数。要做到这一点,你只需要两个循环。一个用于从原始列表中获取每个单词,另一个用于检查单词是否在唯一列表中。为此,你应该制作第二个列表并复制每个单词一次,如果单词出现不止一次你只是增加频率。