算法:使用空格将字符串拆分为N个部分,这样所有部分的长度几乎相同

时间:2010-03-04 17:53:36

标签: algorithm string split

我正在寻找一种算法,它接受一个字符串并将其拆分为一定数量的部分。这些部分应包含完整的单词(因此用空格来分割字符串),部分长度应相同,或者包含最长的部分。

我知道编写一个可以做我想要的功能并不难,但我想知道是否有一个经过充分验证的快速算法用于此目的?

编辑: 为了澄清我的问题,我将向你描述我想要解决的问题。

我生成固定宽度的图像。在这些图像中,我使用PHP中的GD和Freetype编写用户名。由于我有一个固定的宽度,我想把它们分成2行或3行,如果它们不合适的话。

为了尽可能多地填充空间,我希望以每行包含尽可能多的单词的方式拆分名称。有了这个,我的意思是在一行中应该尽可能多的单词,以保持每行的长度接近整个文本块的平均行长度。因此,如果有一个长单词和两个短单词,那么如果两个短单词使所有行相等,那么它们应该站在一行上。

(然后我使用1,2或3行计算文本块宽度,如果它适合我​​的图像,我会渲染它。如果有3行并且它不适合我减少字体大小,直到一切都很好。)

实施例: This is a long text 应该显示类似的东西:

This is a
long text

或:

This is
a long
text

但不是:

This
is a long
text

也不是:

This is a long
text

希望我能更清楚地解释我在寻找什么。

7 个答案:

答案 0 :(得分:6)

如果您正在讨论换行符,请查看Dynamic Line Breaking,它会提供Dynamic Programming解决方案,将字词划分为行。

答案 1 :(得分:3)

我不知道有关证明,但似乎最简单和最有效的解决方案是将字符串的长度除以N然后找到最接近分割位置的空白区域(您将要搜索两者前进和后退。

下面的代码似乎可以工作,虽然它有很多错误条件无法处理。看起来它会在O(n)中运行,其中n是你想要的字符串数。

class Program
{
    static void Main(string[] args)
    {
        var s = "This is a string for testing purposes. It will be split into 3 parts";
        var p = s.Length / 3;
        var w1 = 0;
        var w2 = FindClosestWordIndex(s, p);
        var w3 = FindClosestWordIndex(s, p * 2);
        Console.WriteLine(string.Format("1: {0}", s.Substring(w1, w2 - w1).Trim()));
        Console.WriteLine(string.Format("2: {0}", s.Substring(w2, w3 - w2).Trim()));
        Console.WriteLine(string.Format("3: {0}", s.Substring(w3).Trim()));
        Console.ReadKey();
    }

    public static int FindClosestWordIndex(string s, int startIndex)
    {
        int wordAfterIndex = -1;
        int wordBeforeIndex = -1;
        for (int i = startIndex; i < s.Length; i++)
        {
            if (s[i] == ' ')
            {
                wordAfterIndex = i;
                break;
            }
        }
        for (int i = startIndex; i >= 0; i--)
        {
            if (s[i] == ' ')
            {
                wordBeforeIndex = i;
                break;
            }
        }

        if (wordAfterIndex - startIndex <= startIndex - wordBeforeIndex)
            return wordAfterIndex;
        else
            return wordBeforeIndex;
    }
}

这个输出是:

1: This is a string for
2: testing purposes. It will
3: be split into 3 parts

答案 2 :(得分:1)

再次,按照Brian的回答,我制作了他的代码的PHP版本:

// Input text
$txt = "This is a really long string that should be broken up onto lines of about the same number of characters.";

// Number of lines
$numLines = 3;

/* Do it, result comes as an array: */
$aResult = splitLinesByClosestWhitespace($txt, $numLines);

/* Output result: */
if ($aResult) 
{
    for ($x=1; $x<=$numLines; $x++) 
        echo "Line ".$x.": ".$aResult[$x]."<br>";

} else {
    echo "Not enough spaces to generate the lines!";
}



/**********************/



/**
 * Splits a string into multiple lines of the closest possible same length, 
 * using the closest whitespaces
 * @param string $txt   String to split
 * @param integer $numLines Number of lines
 * @return array|false
 */
 function splitLinesByClosestWhitespace($txt, $numLines) 
 {
   $p           = intval( strlen($txt) / $numLines );
   $aTxtIndx    = array();
   $aTxt        = array();

   // Check we have enough whitespaces to generate the number of lines
   $wsCount = count( explode(" ", $txt) ) - 1;
   if ($wsCount<$numLines)
       return false;

   // Get the indexes
   for ($x=1;  $x<=$numLines; $x++) 
   {
       $aTxtIndx[$x] = FindClosestWordIndex($txt, $p * ($x-1) );
   }

   // Do the split
   for ($x=1;  $x<=$numLines; $x++) 
   {
       if ($x != $numLines)
           $aTxt[$x] = substr( $txt, $aTxtIndx[$x], trim($aTxtIndx[$x+1]) );
       else
           $aTxt[$x] = substr( $txt, trim($aTxtIndx[$x]) );
   }

   return $aTxt;
 }


/**
 * Finds the closest word to a string index
 * @param string $s String to search
 * @param integer $startIndex   Index at which to find the closest word
 * @return integer
 */
function FindClosestWordIndex($s, $startIndex) 
{
    $wordAfterIndex = 0;
    $wordBeforeIndex = 0;

    for ($i = $startIndex; $i < strlen($s); $i++)
    {
        if ($s[$i] == ' ')
        {
            $wordAfterIndex = $i;
            break;
        }
    }
    for ($i = $startIndex; $i >= 0; $i--)
    {
        if ($s[$i] == ' ')
        {
            $wordBeforeIndex = $i;
            break;
        }
    }

    if ($wordAfterIndex - $startIndex <= $startIndex - $wordBeforeIndex)
        return $wordAfterIndex;
    else
        return $wordBeforeIndex;
}

答案 3 :(得分:0)

分区为相同大小的是NP-Complete

答案 4 :(得分:0)

答案 5 :(得分:0)

通常实现自动换行的方法是将尽可能多的单词放在一行上,并在没有空间时将其放到下一行。当然,这假设你有一个最大宽度。

无论您使用何种算法,请记住,除非使用固定宽度的字体,否则您希望使用单词的物理宽度,而不是字母数。

答案 6 :(得分:0)

根据Brian的回答,我制作了他的代码的JavaScript版本:http://jsfiddle.net/gmoz22/CPGY2/

// Input text
var txt = "This is a really long string that should be broken up onto lines of about the same number of characters.";

// Number of lines
var numLines = 3;

/* Do it, result comes as an array: */
var aResult = splitLinesByClosestWhitespace(txt, numLines);

/* Output result: */
if (aResult) 
{
    for (var x = 1; x<=numLines; x++) 
        document.write( "Line "+x+": " + aResult[x] + "<br>" );

} else {
    document.write("Not enough spaces to generate the lines!");
}


/**********************/
// Original algorithm by http://stackoverflow.com/questions/2381525/algorithm-split-a-string-into-n-parts-using-whitespaces-so-all-parts-have-nearl/2381772#2381772, rewritten for JavaScript by Steve Oziel


/**
 * Trims a string for older browsers
 * Used only if trim() if it is not already available on the Prototype-Object
 * since overriding it is a huge performance hit (generally recommended when extending Native Objects)
 */
if (!String.prototype.trim) 
{
    String.prototype.trim = function(){return this.replace(/^\s+|\s+$/g, '');};
}


/**
 * Splits a string into multiple lines of the closest possible same length, 
 * using the closest whitespaces
 * @param {string} txt  String to split
 * @param {integer} numLines    Number of lines
 * @returns {Array}
 */
function splitLinesByClosestWhitespace(txt, numLines) 
{
    var p           = parseInt(txt.length / numLines);
    var aTxtIndx    = [];
    var aTxt        = [];

    // Check we have enough whitespaces to generate the number of lines
   var wsCount = txt.split(" ").length - 1;
   if (wsCount<numLines)
       return false;

    // Get the indexes
    for (var x=1;  x<=numLines; x++) 
    {
        aTxtIndx[x] = FindClosestWordIndex(txt, p * (x-1) );
    }
    // Do the split
    for (var x=1;  x<=numLines; x++) 
    {
        if (x != numLines)
            aTxt[x] = txt.slice(aTxtIndx[x], aTxtIndx[x+1]).trim();
        else 
            aTxt[x] = txt.slice(aTxtIndx[x]).trim();
    }

    return aTxt;
}

/**
 * Finds the closest word to a string index
 * @param {string} s    String to search
 * @param {integer} startIndex Index at which to find the closest word
 * @returns {integer}
 */
function FindClosestWordIndex(s, startIndex) 
{
    var wordAfterIndex = 0;
    var wordBeforeIndex = 0;
    for (var i = startIndex; i < s.length; i++)
    {
        if (s[i] == ' ')
        {
            wordAfterIndex = i;
            break;
        }
    }
    for (var i = startIndex; i >= 0; i--)
    {
        if (s[i] == ' ')
        {
            wordBeforeIndex = i;
            break;
        }
    }

    if (wordAfterIndex - startIndex <= startIndex - wordBeforeIndex)
        return wordAfterIndex;
    else
        return wordBeforeIndex;
}

当所需行数不太接近空白数时,它可以正常工作。 在我给出的示例中,有19个空格,当您要求将其分成17,18或19行时,它会开始出错。 编辑欢迎!