要求文本进行编辑,文本格式

时间:2019-01-16 12:23:13

标签: c++ replace

我想编写一个程序,要求输入以逗号分隔的文本(一个有几个单词的段落)。 要转换文本并在两者之间添加标签,例如将文本格式设置为html

示例: word1, word2, word3<a> word1 </a>, <a> word2 </a>, <a> word3 </a>

因此我开始执行此代码,但我不知道如何继续。如何测试文本以找到单词的开头?我想用ASCII测试吗? 也许有一张可以测试每种情况的桌子?

我不一定要问完整的答案,但是也许可以遵循一个指导。

#include <iostream>
#include <iomanip>
#include <string> //For getline()

using namespace std;

// Creating class
class GetText
{
public:
    string text;
    string line; //Using this as a buffer

    void userText()
    {
        cout << "Please type a message: ";

        do
        {
            getline(cin, line);
            text += line;
        }
        while(line != "");
    }

    void to_string()
    {
        cout << "\n" << "User's Text: " << "\n" << text << endl;
    }
};


int main() {
    GetText test;
    test.userText();
    test.to_string();
    system("pause");

    return 0;
}

2 个答案:

答案 0 :(得分:-1)

接下来您需要做的是用deltimeter(在您的情况下为',')将输入分割成一个向量,然后将所有内容与前置和后缀合并。 C ++ 默认情况下不支持拆分,您必须具有创造力或搜索诸如here之类的解决方案。

答案 1 :(得分:-2)

如果要使其非常简单,可以通过一次检查两个字符来检测单词边界。这是工作中的example

using namespace std;

#include <iostream>
#include <string>

#include <cctype>

typedef enum boundary_type_e {
    E_BOUNDARY_TYPE_ERROR = -1,
    E_BOUNDARY_TYPE_NONE,
    E_BOUNDARY_TYPE_LEFT,
    E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;

typedef struct boundary_s {
    boundary_type_t type;
    int pos;
} boundary_t;

bool is_word_char(int c) {
    return ' ' <= c && c <= '~' && !isspace(c) && c != ',';
}

boundary_t maybe_word_boundary(string str, int pos) {
    int len = str.length();
    if (pos < 0 || pos >= len) {
        return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
    } else {
        if (pos == 0 && is_word_char(str[pos])) {
            // if the first character is word-y, we have a left boundary at the beginning
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
        } else if (pos == len - 1 && is_word_char(str[pos])) {
            // if the last character is word-y, we have a right boundary left of the null terminator
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        } else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
            // if we have a delimiter followed by a word char, we have a left boundary left of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
        } else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
            // if we have a word char followed by a delimiter, we have a right boundary right of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        }
        return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
    }
}

int main() {
    string str;
    string ins_left("<tag>");
    string ins_right("</tag>");
    getline(cin, str);

    // can't use length for the loop condition without recalculating it all the time
    for (int i = 0; str[i] != '\0'; i++) {
        boundary_t boundary = maybe_word_boundary(str, i);
        if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
            str.insert(boundary.pos, ins_left);
            i += ins_left.length();
        } else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
            str.insert(boundary.pos, ins_right);
            i += ins_right.length();
        }
    }
}

使用enum class会更好,但我忘记了该符号。您也可以复制到缓冲区中,而不是就地生成新的字符串,我只是想使其简单。随意将其扩展为基于类的C ++样式。要获得所需的确切输出,请先去除空格,然后在ins_left和ins_right中添加空格。