有没有办法得到" \ n"来自溪流?

时间:2014-04-15 15:53:20

标签: c++ string stream

我正在尝试使用一个文件,并将其转换为某种数据结构(Text是"数组"段落,段落是"数组"句子和句子是一个"数组"单词,是char *)。

为了让自己变得简单,我正在使用数据流(确切地说是ifstream),但我遇到的一个问题是定义段落结束的地方(2' \ n'被视为结束一段)。简单的方法是通过字符对char进行char,并检查它们中的每一个是否是空格或者' \ n'但是这很长而且有点痛苦。

代码看起来像这样:

    std::ifstream fd(filename);
    char buffer[128];

    while(fd >> buffer)
    {
        /* Some code in here that does things with buffer */
    }

而且 - 好吧,它有效,但完全忽略了所有段落。 fd.get(buffer, 128, '\n')也不能按需要工作 - 它会在阅读一次后切断所有内容。

那么 - 有没有办法比通过char读取char更容易?由于任务禁止我们使用向量或字符串,因此无法使用getline()

更新

所以看起来std :: istream :: getline可能对我有用,但它仍然不是我的预期。它读取,第一行,然后发生了一些奇怪的事情。

代码如下:

std::ifstream fd(fl);
char buffer[128];
fd.getline(buffer, 128);
std::cout << "555 - [" << buffer << "]" << std::endl;
std::cout << fd.gcount() << std::endl;
fd.getline(buffer, 128);
std::cout << "777 - [" << buffer << "]" << std::endl;
std::cout << fd.gcount() << std::endl;

输出看起来像那样

]55 - [text from file
23
]77 - [
2

而且 - 是的,我不认为我理解发生了什么。

1 个答案:

答案 0 :(得分:1)

根据我的理解,您可能不会使用任何标准容器。

所以我认为是可能的:

  1. 将整个文件读入缓冲区
  2. 为段落
  3. 标记缓冲区
  4. 将每个段落标记为句子
  5. 将每个句子标记为单词
  6. 对于第一部分,您可以使用:

    //! Reads a file to a buffer, that must be deleted afterwards
    char* readFile(const char *filename) {
      std::ifstream ifs(filename, std::ifstream::binary);
    
      if (!filename.good())
        return NULL;
    
      ifs.seekg(0, ifs.end);
      size_t len = ifs.tellg();
      ifs.seekg(0, ifs.beg);
    
      char* buffer = new char[len];
      if (!buffer) { // Check for failed alocation
        ifs.close();
        return NULL;
      }
    
      if (ifs.read(buffer, len) != len) { // Check if the entire file was read
        delete[] buffer;
        buffer = NULL;
      }
      ifs.close();
      return buffer;
    }
    

    准备好该函数后,我们现在需要的是使用它并对字符串进行标记化。为此,我们必须定义我们的类型(基于链接列表,使用C编码格式)

    struct Word {
      char *contents;
      Word *next;
    };
    
    struct Sentence {
      Word *first;
      Sentence *next;
    };
    
    struct Paragraph {
      Sentence *first;
      Paragraph *next;
    };
    
    struct Text {
      Paragraph *first;
    };
    

    定义了类型后,我们现在可以开始阅读我们的文本了:

    //! Splits a sentence in as many Word elements as possible
    void readSentence(char *buffer, size_t len, Word **target) {
        if (!buffer || *buffer == '\0' || len == 0) return;
    
        *target = new Word;
        (*target)->next = NULL;
    
        char *end = strpbrk(buffer, " \t\r\n");
    
        if (end != NULL) {
            (*target)->contents = new char[end - buffer + 1];
            strncpy((*target)->contents, buffer, end - buffer);
            (*target)->contents[end - buffer] = '\0';
            readSentence(end + 1, strlen(end + 1), &(*target)->next);
        }
        else {
            (*target)->contents = _strdup(buffer);
        }
    }
    
    //! Splits a paragraph from a text buffer in as many Sentence as possible
    void readParagraph(char *buffer, size_t len, Sentence **target) {
        if (!buffer || *buffer == '\0' || len == 0) return;
    
        *target = new Sentence;
        (*target)->next = NULL;
    
        char *end = strpbrk(buffer, ".;:?!");
    
        if (end != NULL) {
            char *t = new char[end - buffer + 2];
            strncpy(t, buffer, end - buffer + 1);
            t[end - buffer + 1] = '\0';
            readSentence(t, (size_t)(end - buffer + 1), &(*target)->first);
            delete[] t;
    
            readParagraph(end + 1, len - (end - buffer + 1), &(*target)->next);
        }
        else {
            readSentence(buffer, len, &(*target)->first);
        }
    }
    
    //! Splits as many Paragraph as possible from a text buffer
    void readText(char *buffer, Paragraph **target) {
        if (!buffer || *buffer == '\0') return;
    
        *target = new Paragraph;
        (*target)->next = NULL;
    
        char *end = strstr(buffer, "\n\n"); // With this, we have a pointer to the end of a paragraph. Pass to our sentence parser.
        if (end != NULL) {
            char *t = new char[end - buffer + 1];
            strncpy(t, buffer, end - buffer);
            t[end - buffer] = '\0';
            readParagraph(t, (size_t)(end - buffer), &(*target)->first);
            delete[] t;
    
            readText(end + 2, &(*target)->next);
        }
        else
            readParagraph(buffer, strlen(buffer), &(*target)->first);
    }
    
    Text* createText(char *contents) {
        Text *text = new Text;
        readText(contents, &text->first);
        return text;
    }
    

    例如,您可以像这样使用它:

    int main(int argc, char **argv) {
        char *buffer = readFile("mytext.txt");
        Text *text = createText(buffer);
        delete[] buffer;
    
        for (Paragraph* p = text->first; p != NULL; p = p->next) {
            for (Sentence* s = p->first; s != NULL; s = s->next) {
                for (Word* w = s->first; w != NULL; w = w->next) {
                    std::cout << w->contents << " ";
                }
            }
            std::cout << std::endl << std::endl;
        }
    
        return 0;
    }
    

    请记住,此代码可能有效,也可能无效,因为我没有对此进行测试。

    来源: