Question

我有一些文字存储在一个字符串中。每当我看到一个特定的字符序列时，我想在模式之后插入一些字符（将字符串中的所有现有字符移动到字符串中的更高/更高的索引）。我认为最有效的方法是保留一个大的字符数组（大，因为我不确切知道需要多少个插入，但我知道添加的字符总数将小于原始字符串的长度）然后迭代原始字符串，将字符复制到新的字符数组中，然后每当识别字符模式时，插入新字符串，然后继续从源/复制字符原始字符串。任何人都可以想到更快或更好的方法吗？这将经常进行，所以我想尽可能地优化它。

更新：有些人建议使用std :: string路由而不是字符数组，以避免与字符数组关联的内存管理。

我正在寻找的模式是一个5个字符的字符串然后我一直在看，直到我看到换行符，然后在那一点附加3或5个字符。我会通过这样做来实现它：

bool matchedstart = false;
std::string origstr;
unsigned int strlength = origstr.length();
int strlengthm5 = origstr.length() - 5;
for(int i = 0, j = 0; i < strlength; i++, j++) {       
    if(!matchedstart && i < strlengthm5) {
       if(origstr[i] == 't' && origstr[i+1] == 'n' && origstr[i+2] = 'a'...) {
           matchedstart = true;
       }
     }
    else if(origstr[i] == '\n') {
         //append extra text here
         matchedstart = false;
     }
     outputstr[j] = origstr[i];
}

该算法比string.find（）更有效吗？我怀疑是因为我将输入文本硬编码到上面的算法中。我怀疑string.find（）会涉及与字符串长度成比例的短内部for循环，尽管可能不会在我的if-chain中涉及的编译器优化的短路评估中节省很多时间。我想我必须对此进行分析，以了解字符串涉及多少开销。我稍后会发布我的发现。

Answer 1

您可以使用std::string，其中包含find()和insert()方法，例如：

std::string str = "whatever you want to search in...";
std::string seq = "what to find";

auto pos = str.find(seq);
if (pos != std::string::npos)
    str.insert(pos + seq.length(), "what to insert");

如果要替换序列的多个实例，find()有一个可选的pos参数来指定要搜索的起始索引：

std::string str = "whatever you want to search in...";
std::string seq = "what to find";
std::string ins = "what to insert";

auto pos = str.find(seq);
while (pos != std::string::npos)
{
    pos += seq.length();
    str.insert(pos, ins);
    pos = str.find(seq, pos + ins.length());
}

由于您说＆＃34; 知道添加的字符总数将小于原始字符串的长度＆＃34;，您可以使用std:string::reserve()增加字符串的容量以避免在插入期间重新分配：

std::string str = "whatever you want to search in...";
std::string seq = "what to find";
std::string ins = "what to insert";

auto pos = str.find(seq);
if (pos != std::string::npos)
{
    str.reserve(str.length() * 2);
    do
    {
        pos += seq.length();
        str.insert(pos, ins);
        pos = str.find(seq, pos + ins.length());
    }
    while (pos != std::string::npos);
    str.shrink_to_fit();
}

更新：如果insert()证明速度过慢，您可能会考虑建立第二个std::string，这样就不会浪费时间在原std::string，例如：

std::string str = "whatever you want to search in...";
std::string seq = "what to find";
std::string ins = "what to insert";
std::string newStr;

auto foundPos = str.find(seq);
if (foundPos == std::string::npos)
{
    newStr = str;
}
else
{
    newStr.reserve(str.length() * 2);
    decltype(foundPos) startPos = 0;
    auto ptr = str.c_str();
    do
    {
        foundPos += seq.length();
        newStr.append(ptr + startPos, foundPos - startPos);
        newStr.append(ins);
        startPos = foundPos;
        foundPos = str.find(seq, startPos);
    }
    while (foundPos != std::string::npos);
    newStr.append(ptr + startPos, str.length() - startPos);
}

Answer 2

首先，使用std::string而不是用字符数组折磨自己。

你的方法非常好，我想到的唯一方法是优化它将是搜索模式的部分。您现在所描述的似乎是使用天真的字符串搜索，您尝试在每个位置匹配模式。这需要O(nm)，但有些算法可以更快地完成。

您应该使用std::string::find，这应该提供一个非常有效的算法来执行O(n + m)之类的操作，尽管标准并不能保证它。

C ++字符串插入

2 个答案: