Question

我正在尝试从用户计算文本文件中的单词数，将它们写入向量，然后输出第一行中包含单词数的文本文件，后续行包含单词中的单词矢量，按排序顺序显示。有什么想法有什么不对吗？

#include <iostream>
#include  <iomanip>
#include <vector>
#include <string>
#include <fstream>
#include <algorithm>
#include <stdio.h>
#include <ctype.h>

using namespace std;

//gets user input for file names to open/write to
string getUserInput (string inputORoutput) {
  cout << "Enter desired " << inputORoutput << " filename (include file extension). ";
  string userInput;
  getline(cin,userInput);
  return userInput;
}
//ensures that string word is an alphabetical word
string isAlpha (string& word) {
  string newWord;
  for (int i = 0; i < word.length(); i++) {
    if (isalpha(word[i])) {
      newWord += word[i];
    }
    else if (isspace(word[i])) {
      word[i] = word[i+1];
    }
    else {
      newWord = "";
    }
  }
  return newWord;
}
//removes empty elements of uniqueWords
void removeEmptyLines (vector<string>& uniqueWords) {
  for (int i = 0; i < uniqueWords.size(); i++) {
    if (uniqueWords [i] == "") {
      uniqueWords.erase(uniqueWords.begin() + i);
    }
  }
}
//calls isAlpha, calls removeEmptyLines, & sorts uniqueWords in alphabetical order
void sortUniqueWords (vector<string>& uniqueWords) {
  sort (uniqueWords.begin(), uniqueWords.end());
  for (int i = 0; i <= uniqueWords.size(); i++) { //remove this loop if digits are allowed
    uniqueWords[i] = isAlpha(uniqueWords[i]);
  }
  removeEmptyLines(uniqueWords); //remove this loop if digits are allowed
  if (uniqueWords.size() == 2) { //alpha.txt wont work without this
    uniqueWords [1] = "";
  }
}
//adds a new unique word to uniqueWords vector
void addUniqueWord (vector<string>& uniqueWords, string lineToAdd) {
  bool doesContain = false;
  int i = 0;
  while (i <= uniqueWords.size() && !doesContain) {
    if (lineToAdd == uniqueWords [i]) {
      doesContain = true;
    }
    else {
      i++;
    }
  }
  if (!doesContain) {
    uniqueWords.push_back(lineToAdd);
  }
}

int main(int argc, const char * argv[]) {
  vector<string> uniqueWords(1); //for some reason the program produces error EXC_BAD_ACCESS (code=1, address=0x0)
  string fileName;
  ifstream inFile;
  inFile.open(getUserInput("input"));
  string currentLine = "";
  while (getline(inFile, currentLine)) { //reads input and tests for failure
    addUniqueWord (uniqueWords, currentLine);
  }
  uniqueWords.erase(uniqueWords.begin() + 1);
  uniqueWords.erase(uniqueWords.begin());
  sortUniqueWords (uniqueWords);
  inFile.close();
  ofstream outFile;
  outFile.open(getUserInput("output"));
  for (int i = 0; i <= uniqueWords.size(); i++) {
    outFile << uniqueWords[i] << endl;
  }
  return 0;
}

Answer 1

在我看来，不是试图修复此代码，而是重新开始，创建更简单，更高效的内容。

单词后面跟着它们的排序列表。您显然只想将连续的字母字符串视为单词。假设是这样的话，我的工作方式却截然不同。首先，我创建一个ctype facet，将字母分类为字母，其他所有字母分为“空格”：

struct alpha_only: std::ctype<char> { alpha_only(): std::ctype<char>(get_table()) {} static std::ctype_base::mask const* get_table() { // As far as we care, everything is white-space: static std::vector<std::ctype_base::mask> rc(std::ctype<char>::table_size,std::ctype_base::space); // except letters: std::fill(&rc['a'], &rc['z'], std::ctype_base::alpha); std::fill(&rc['A'], &rc['Z'], std::ctype_base::alpha); return &rc[0]; } };

然后，不要尝试过滤，只有在读取时才将唯一的单词插入到向量中，我会在向量中插入所有单词，然后对它们进行排序并使它们唯一算账：

int main() { // For simplicity, we'll just read from standard input. std::cin.imbue(std::locale(std::locale(), new alpha_only)); // Initialize vector from file: std::vector<std::string> words((std::istream_iterator<std::string>(infile)), std::istream_iterator<std::string>()); // erase the non-unique words words.erase(std::unique(words.begin(), words.end()), words.end); // Show the number of unique words: std::cout << "Number of unique words: " << words.size(); // show the words: for (auto const & s : words) std::cout << s << "\n"; return 0; }

如果你真的想确保只存储唯一的单词，那么可以更简单地完成（尽管它可能会更慢）。如果您正在处理大型文件（特别是有大量重复文件），您可能更喜欢std::unordered_set，然后复制到矢量并进行排序。

int main() { // For simplicity, we'll just read from standard input. std::cin.imbue(std::locale(std::locale(), new alpha_only)); // Initialize vector from file: std::set<std::string> words((std::istream_iterator<std::string>(infile)), std::istream_iterator<std::string>()); // Show the number of unique words: std::cout << "Number of unique words: " << words.size(); // show the words: for (auto const & s : words) std::cout << s << "\n"; return 0; }

Answer 2

该计划的错误在于您将所有这些放在一起而没有花时间了解任何单个步骤的后果/副作用，因此，您并不完全清楚所有这些步骤程序呢。因此，我们这些阅读它以试图帮助您的人不知道您在大多数步骤中实际打算做什么。例如，主要的第一行

vector<string> uniqueWords(1); //for some reason the program produces error EXC_BAD_ACCESS (code=1, address=0x0)

错误意味着你有一个空指针，它与这一行无关。看起来你甚至不确定如何使用调试器。

所有这一行都是创建一个包含一个空字符串的向量。为什么你想让你的矢量以空字符串开头？当你开始这样做时，它似乎又回来咬你了

uniqueWords.erase(uniqueWords.begin() + 1);
uniqueWords.erase(uniqueWords.begin());

那里 - 那是纯粹的，坚实的，坏的代码金。如果你是为某人工作，那么该代码就有理由被解雇。

我实际上在代码的某些部分看到了一些有希望的做法，假设这些做法不是从其他地方复制的，我的建议是：STOP。删除所有代码，然后重新开始。迭代工作：花时间了解每个步骤的工作原理，以便您可以正确地合并它。使用调试器遍历您的代码，不仅观察代码流，还观察数据流。查找std::vector并了解它是如何工作的，当你在那里时，看看其他选项，比如std :: hash。

是什么导致此向量错误？

2 个答案: