Question

确定。所以我有这个函数，init（）：

void init()
{
fstream file;
int index = 0;

char temp_list[60000][15];

listlen = 0;
current_index = 0;

file.open("en_US.dic");
while(!file.eof())
{   
    file >> temp_list[index];
    index++;
}

listlen = index;
file.close();
file.open("en_US.dic");

word_list = new char*[listlen];

int count = 0;
for(int i = 0; i < listlen; i++)
{
    word_list[i] = new char[21];
    file >> word_list[i];
}

file.close();
}

此代码编译并正确运行，没有错误。但是，当我更改行

word_list[i] = new char[21]

到

word_list[i] = new char[x] //x < 21

我收到以下错误：

dict: malloc.c:3074: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.

我对编程有点新鲜（＆lt; 2年），我从来没有见过这样的东西。有人有主意吗？提前谢谢！

Answer 1

该代码有三个主要问题，其中两个是：

while (!file.eof())
{   
    file >> temp_list[index];
    index++;
}

您无法测试file.eof()以查看 next 操作是否会失败，只有当上一个命中eof时才会失败，并且通常仅在失败时才有用，所以改成它：

while (file >> temp_list[index]) {
    index++;
}

由于提取（>>）返回流并且流可以直接测试，此代码现在在每次迭代时测试流，并且只有在提取成功时才递增索引。

现在，当提取到char数组时，输入流在空格处停止，但除非您告诉它们，否则它们不知道它们可以存储的最大长度。在代码中稍后出现同样的错误可能就是为什么你看到你做了什么，因为我怀疑你读的数据远远超出你的期望，因此践踏了你的记忆。修正：

while (file >> std::setw(15) >> temp_list[index]) {
    index++;
}

但是，最后一个主要问题是你分配资源并泄露它们，所以改用vector和string：

#include <fstream>
#include <iostream>
#include <string>
#include <vector>

void init() {
  typedef std::vector<std::string> C; // for later convenience
  C words;
  {
    ifstream file ("en_US.dic");
    if (!file) {
      std::cerr << "could not open file\n";
      // handle error: throw an exception, call abort(), etc.
    }
    for (std::string word; file >> word;) {
      words.push_back(word);
    }
    // if you want to read lines instead:
    //for (std::string line; std::getline(file, line);) {
    //  words.push_back(line);
    //}
  }
  // now use words[0] through words[words.size() - 1]
  std::cout << "Read " << words.size() << " words:\n";
  for (int i = 0; i < words.size(); ++i) {
    std::cout << "  " << words[i] << '\n';
  }
  std::cout << "Output again:\n";
  for (C::const_iterator i = words.begin(); i != words.end(); ++i)
  {
    std::cout << "  " << *i << '\n';
  }
}

Answer 2

我猜你的一个单词比x中指定的值长。

当发生这种情况时，您将溢出malloc缓冲区。

如果你分配N个字节，你需要确保你写的不超过N个字节。

使用运算符＆gt;＆gt;和char缓冲区是灾难的秘诀。运营商GT;＆GT;将继续读/写，直到它到达单词分隔符。由于运营商＆gt;＆gt;不知道char *缓冲区有多大，当字长于缓冲区时，它会溢出缓冲区。如果您想使用运算符＆gt;＆gt;要提取单词，请使用std :: string。

发生了什么

实现malloc的一种非常常见的方法是在malloc返回的缓冲区之间使用簿记数据。当您覆盖此数据时，malloc对数据结构的假设不再存在。

所以，malloc有这样的东西：

+------------------+-------------+------------------+-------------+-----------
| malloc internals | user buffer | malloc internals | user buffer | etc...
+------------------+-------------+------------------+-------------+-----------

因此，如果您为用户缓冲区分配了8个字节，但随后写了12个字节，那么您只是删除了下一个malloc内部记录的前4个字节。

Answer 3

如果文件的长度为20或更长，file >> word_list[i]将写入已分配缓冲区的末尾，这可能导致您看到的错误。这称为buffer overflow。

写入temp_list时这也是一个问题，但在这种情况下，缓冲区溢出的破坏性较小，因为它可能只会覆盖用于下一个字的内存。

解决此问题的一种方法是使用std::string而不是char *的数组 - 分配将自动处理。

Answer 4

您可能想在此处更改设计。字典很大。
您是否需要将所有单词（数据）传送到内存中？

由于字典很大，因此它们的设计使它们不需要同时完全在内存中。专业词典的索引表小于整个数据文件。原则思想是索引表很小，可以拖入内存并保存在内存中，而不是一次拖入所有数据。

我通过将初始索引表保存到内存中来优化程序。第一个索引表的结果是另一个表的文件偏移量（或另一个文件的名称）。如有必要，此辅助表将被拖入内存，依此类推，直到找到确切的项目。

请参阅以下主题（在网上搜索）：

B+ Tree
索引表
阻止I / O
文件偏移

Answer 5

这真的搞得一团糟：

for(int i = 0; i < listlen; i++)
{
    word_list[i] = new char[21];
    file >> word_list[i];
}

如果任何单词大于20个字符（'\ 0'为+1）。那么基本上你会在内存中涂写内存管理器使用的wsa。这将导致后续分配和解除分配的各种问题。

它在前一个循环中起作用，因为缓冲区是连续的：

char temp_list[60000][15];

虽然一行中的一个单词可能重叠到下一行，但这不会是一个问题，除非你实际上正在将一个大单词读入temp_list [59999]（它会重叠到另一个变量上）。

非常奇怪的malloc错误

5 个答案: