C ++:如何将字符串拆分为大小均匀的字符串?

时间:2011-11-21 05:38:13

标签: c++ string algorithm

在C ++中,如何将字符串拆分为大小均匀的字符串?

例如,我有一个字符串“012345678”,并希望它将它分成5个较小的字符串,这应该给我一些像“01”,“23”,“45”,“67”,“8”这样的东西。

我无法确定较小字符串的长度。在前面的示例中,原始字符串的大小为9,我想将其拆分为5个较小的字符串,因此除了最后一个字符串之外的每个较小的字符串应该是9/5 = 1的长度,但最后一个字符串的长度为9 - 1 * 4 = 5,这是不可接受的。

所以这个问题的正式定义:原始字符串被拆分为完全n个子字符串,并且没有两个子字符串的长度差异应大于1。

我的重点不是C ++语法或库。这是设计算法的方法,以便返回的字符串的大小几乎相等。

6 个答案:

答案 0 :(得分:5)

divide N items into M parts,长度在一个单位内,您可以使用公式(N*i+N)/M - (N*i)/M作为i'部分的长度,如下所示。

 #include <string>
 #include <iostream>
 using namespace std;

 int main() {
   string text = "abcdefghijklmnopqrstuvwxyz";
   int N = text.length();
   for (int M=3; M<14; ++M) {
     cout <<" length:"<< N <<"  parts:"<< M << "\n";
     int at, pre=0, i;
     for (pre = i = 0; i < M; ++i) {
       at = (N+N*i)/M;
       cout << "part " << i << "\t" << pre << "\t" << at;
       cout << "\t" << text.substr(pre, at-pre) << "\n";
       pre = at;
     }
   }
   return 0;
 } 

例如,当M为4或5时,上面的代码会产生:

  length:26  parts:4
 part 0 0   6   abcdef
 part 1 6   13  ghijklm
 part 2 13  19  nopqrs
 part 3 19  26  tuvwxyz
  length:26  parts:5
 part 0 0   5   abcde
 part 1 5   10  fghij
 part 2 10  15  klmno
 part 3 15  20  pqrst
 part 4 20  26  uvwxyz

答案 1 :(得分:4)

我的解决方案:

std::vector<std::string> split(std::string const & s, size_t count)
{
       size_t minsize = s.size()/count;
       int extra = s.size() - minsize * count;
       std::vector<std::string> tokens;
       for(size_t i = 0, offset=0 ; i < count ; ++i, --extra)
       {
          size_t size = minsize + (extra>0?1:0);
          if ( (offset + size) < s.size())
               tokens.push_back(s.substr(offset,size));
          else
               tokens.push_back(s.substr(offset, s.size() - offset));
          offset += size;
       }       
       return tokens;
}

测试代码:

int main() 
{
      std::string s;
      while (std::cin >> s)
      {
        std::vector<std::string> tokens = split(s, 5);
        //output
        std::copy(tokens.begin(), tokens.end(), 
              std::ostream_iterator<std::string>(std::cout, ", "));
        std::cout << std::endl;
      }
}

输入:

012345
0123456
01234567
012345678
0123456789
01234567890

输出:

01, 2, 3, 4, 5, 
01, 23, 4, 5, 6, 
01, 23, 45, 6, 7, 
01, 23, 45, 67, 8, 
01, 23, 45, 67, 89, 
012, 34, 56, 78, 90, 

在线演示:http://ideone.com/gINtK

此解决方案趋向使令牌甚至,即所有令牌的大小可能不同。

答案 2 :(得分:1)

知道子串的长度就足够了;
假设m是字符串的size()

int k = (m%n == 0)? n : n-m%n;  

然后,k子字符串的长度应为m/nn-k长度为m/n+1

答案 3 :(得分:0)

尝试substr

答案 4 :(得分:0)

您可以获得要将其拆分的迭代器,然后使用它们构造新的字符串。例如:

std::string s1 = "string to split";
std::string::iterator halfway = s1.begin() + s1.size() / 2;
std::string s2(s1.begin(), halfway);
std::string s3(halfway, s1.end());

答案 5 :(得分:0)

让我们说字符串长度为L,并且必须在n子字符串中拆分。

# Find the next multiple of `n` greater than or equal to `L`

L = 9
n = 5

LL = n * (L / n)
if LL < L:
    LL += n

# Split a string of length LL into n equal sizes. The string is at
# most (n-1) longer than L.

lengths = [(LL / n) for x in range (n)]

# Remove one from the first (or any) (LL-L) elements.
for i in range (LL-L):
    lengths [i] = lengths [i] - 1

# Get indices from lengths. 
s = 0
idx = []
for i in lengths:
    idx.append (s)
    s = s + i
idx.append (L)

print idx

修改 好的,好的,我忘记它应该是C ++。

修改 这就是......

#include <vector>
#include <iostream>

unsigned int L = 13;
unsigned int n = 5;

int
main ()
{
  int i;
  unsigned int LL;
  std::vector<int> lengths, idx;

  /* Find the next multiple of `n` greater than or equal to `L` */
  LL = n * (L / n);
  if (LL < L)
    LL += n;

  /* Split a string of length LL into n equal sizes. The string is at
     most (n-1) longer than L. */
  for (i = 0; i < n; ++i)
    lengths.push_back (LL/n);

  /*  Remove one from the first (or any) (LL-L) elements. */
  for (i = 0; i < LL - L; ++i)
    --lengths [i];

  /* Get indices from lengths.  */
  int s = 0;
  for (auto &ii: lengths)
    {
      idx.push_back (s);
      s += ii;
    }

  idx.push_back (L);

  for (auto &i : idx)
    std::cout << i << " ";

  std::cout << std::endl;
  return 0;
}