如何在同一级别的大括号​​之间提取字符串?

时间:2016-07-28 01:15:56

标签: c++ string string-parsing

想象一下,我有一个遵循以下格式的未知字符串:

Blablabla
{
    "Some Text"
    2
    {
        "Sub Text"
         99
    }
    2
    {
        "Sub Text"
         99
    }
}
Blablabla2
{
    "Some Text"
    2
    {
        "Sub Text"
         99
    }
}

我需要能够从第一个分隔符({})之间的每个子字符串中提取此字符串。因此,在此示例中,运行以下函数:

ExtractStringBetweenDelimitersOnSameLevel(string, "{", "}")

应从原始字符串中提取以下子字符串,然后将其返回:

    "Some Text"
    2
    {
        "Sub Text"
         99
    }

问题在于它由于第二层分隔符而返回一个较短的字符串。

这是我的代码:

const int Count(
   const std::string& haystack,
   const std::string& needle,
   const int starting_index,
   const int maximum_index)
{
   int total = 0;
   int offset = starting_index;

   size_t current_index = std::string::npos;
   while ((current_index = haystack.find(needle, offset)) != std::string::npos)
   {
      if (current_index >= maximum_index)
      {
         break;
      }

      total++;
      offset = static_cast<int>(current_index + needle.size());
   }

   return total;
}

const size_t FindNthDelimiter(
   const std::string& haystack,
   const std::string& needle,
   const int nth)
{
   int total_found = 0;
   int offset = 0;

   size_t current_index = std::string::npos;
   while ((current_index = haystack.find(needle, offset)) != std::string::npos)
   {
      total_found++;
      offset = static_cast<int>(current_index) + 1;

      if (total_found == nth)
      {
         return offset;
      }
   }

   std::cout << "String does not have nth element." << std::endl;

   return offset;
}

std::string ExtractStringBetweenDelimitersOnSameLevel(
   std::string& original_string,
   const std::string& opening_delimiter,
   const std::string& closing_delimiter)
{
   // Find the first delimiter...
   const size_t first_delimiter = original_string.find(opening_delimiter);
   if (first_delimiter != std::string::npos)
   {
      const size_t second_delimiter = original_string.find(closing_delimiter);
      if (second_delimiter != std::string::npos)
      {
         // Total first delimiters found until first closed delimiter...
         int total_first_delimiters = Count(original_string, opening_delimiter, static_cast<int>(first_delimiter), static_cast<int>(second_delimiter));
         const size_t index_of_nth_closer = FindNthDelimiter(original_string, closing_delimiter, total_first_delimiters);

         std::string needle = original_string.substr(first_delimiter + opening_delimiter.size(), index_of_nth_closer - opening_delimiter.size() - 1);
         original_string.erase(first_delimiter, index_of_nth_closer + closing_delimiter.size());

         return needle;
      }
   }

   return "";
}

1 个答案:

答案 0 :(得分:1)

  

&#34;你越是越想管道,就越容易   停止排水。&#34; - Scotty,Star Trek III。

显示的代码对于这样一个简单的任务来说似乎是过度设计的。

此外,它似乎甚至没有完全实现给定的任务。该任务被描述为提取每个顶级字符串:

  

第一层分隔符之间的每个子字符串

但显示的代码似乎只提取了第一个。试图找出复杂算法出错的地方是不值得的。只需重写它来完成整个任务,原始大小的一半就更容易了。这不应该占用十几行或两行代码,至少对于根算法而言。而只提取第一个字符串的代码已经比这长很多倍了。

以下示例提取匹配的{}分隔符之间的每个顶级字符串,并将其返回到lambda回调。 main()提供了一个示例lambda,用于将每个字符串打印到std::cout

#include <string>
#include <algorithm>
#include <iostream>

template<typename functor_type> void ExtractStringBetweenDelimitersOnSameLevel(
    const std::string &original_string,
    char opening_delimiter, // Should be '{'
    char closing_delimiter, // Should be '}'
    functor_type &&functor) // Lambda that receives each string.
{
    auto b=original_string.begin(), e=original_string.end(), p=b;

    int nesting_level=0;

    while (b != e)
    {
        if (*b == closing_delimiter)
        {
            if (nesting_level > 0 && --nesting_level == 0)
            {
                functor(std::string(p, b));
            }
        }

        if (*b++ == opening_delimiter)
        {
            if (nesting_level++ == 0)
                p=b;
        }
    }
}


int main()
{
    std::string search_string="\n"
        "Blablabla\n"
        "{\n"
        "    \"Some Text\"\n"
        "    2\n"
        "    {\n"
        "        \"Sub Text\"\n"
        "         99\n"
        "    }\n"
        "    2\n"
        "    {\n"
        "        \"Sub Text\"\n"
        "         99\n"
        "    }\n"
        "}\n"
        "Blablabla2\n"
        "{\n"
        "    \"Some Text\"n"
        "    2\n"
        "    {\n"
        "        \"Sub Text\"\n"
        "         99\n"
        "    }\n"
        "}";

    ExtractStringBetweenDelimitersOnSameLevel
        (search_string,
         '{',
         '}',
         [](const std::string &string)
         {
             std::cout << "Extracted: " << string << std::endl;
         });
}

您的作业分配是修改它以处理多字符分隔符。这也不应该复杂得多。