我正在寻找一种算法,它接受一个字符串并将其拆分为一定数量的部分。这些部分应包含完整的单词(因此用空格来分割字符串),部分长度应相同,或者包含最长的部分。
我知道编写一个可以做我想要的功能并不难,但我想知道是否有一个经过充分验证的快速算法用于此目的?
编辑: 为了澄清我的问题,我将向你描述我想要解决的问题。
我生成固定宽度的图像。在这些图像中,我使用PHP中的GD和Freetype编写用户名。由于我有一个固定的宽度,我想把它们分成2行或3行,如果它们不合适的话。
为了尽可能多地填充空间,我希望以每行包含尽可能多的单词的方式拆分名称。有了这个,我的意思是在一行中应该尽可能多的单词,以保持每行的长度接近整个文本块的平均行长度。因此,如果有一个长单词和两个短单词,那么如果两个短单词使所有行相等,那么它们应该站在一行上。
(然后我使用1,2或3行计算文本块宽度,如果它适合我的图像,我会渲染它。如果有3行并且它不适合我减少字体大小,直到一切都很好。)
实施例:
This is a long text
应该显示类似的东西:
This is a
long text
或:
This is
a long
text
但不是:
This
is a long
text
也不是:
This is a long
text
希望我能更清楚地解释我在寻找什么。
答案 0 :(得分:6)
如果您正在讨论换行符,请查看Dynamic Line Breaking,它会提供Dynamic Programming解决方案,将字词划分为行。
答案 1 :(得分:3)
我不知道有关证明,但似乎最简单和最有效的解决方案是将字符串的长度除以N然后找到最接近分割位置的空白区域(您将要搜索两者前进和后退。
下面的代码似乎可以工作,虽然它有很多错误条件无法处理。看起来它会在O(n)中运行,其中n是你想要的字符串数。
class Program
{
static void Main(string[] args)
{
var s = "This is a string for testing purposes. It will be split into 3 parts";
var p = s.Length / 3;
var w1 = 0;
var w2 = FindClosestWordIndex(s, p);
var w3 = FindClosestWordIndex(s, p * 2);
Console.WriteLine(string.Format("1: {0}", s.Substring(w1, w2 - w1).Trim()));
Console.WriteLine(string.Format("2: {0}", s.Substring(w2, w3 - w2).Trim()));
Console.WriteLine(string.Format("3: {0}", s.Substring(w3).Trim()));
Console.ReadKey();
}
public static int FindClosestWordIndex(string s, int startIndex)
{
int wordAfterIndex = -1;
int wordBeforeIndex = -1;
for (int i = startIndex; i < s.Length; i++)
{
if (s[i] == ' ')
{
wordAfterIndex = i;
break;
}
}
for (int i = startIndex; i >= 0; i--)
{
if (s[i] == ' ')
{
wordBeforeIndex = i;
break;
}
}
if (wordAfterIndex - startIndex <= startIndex - wordBeforeIndex)
return wordAfterIndex;
else
return wordBeforeIndex;
}
}
这个输出是:
1: This is a string for
2: testing purposes. It will
3: be split into 3 parts
答案 2 :(得分:1)
再次,按照Brian的回答,我制作了他的代码的PHP版本:
// Input text $txt = "This is a really long string that should be broken up onto lines of about the same number of characters."; // Number of lines $numLines = 3; /* Do it, result comes as an array: */ $aResult = splitLinesByClosestWhitespace($txt, $numLines); /* Output result: */ if ($aResult) { for ($x=1; $x<=$numLines; $x++) echo "Line ".$x.": ".$aResult[$x]."<br>"; } else { echo "Not enough spaces to generate the lines!"; } /**********************/ /** * Splits a string into multiple lines of the closest possible same length, * using the closest whitespaces * @param string $txt String to split * @param integer $numLines Number of lines * @return array|false */ function splitLinesByClosestWhitespace($txt, $numLines) { $p = intval( strlen($txt) / $numLines ); $aTxtIndx = array(); $aTxt = array(); // Check we have enough whitespaces to generate the number of lines $wsCount = count( explode(" ", $txt) ) - 1; if ($wsCount<$numLines) return false; // Get the indexes for ($x=1; $x<=$numLines; $x++) { $aTxtIndx[$x] = FindClosestWordIndex($txt, $p * ($x-1) ); } // Do the split for ($x=1; $x<=$numLines; $x++) { if ($x != $numLines) $aTxt[$x] = substr( $txt, $aTxtIndx[$x], trim($aTxtIndx[$x+1]) ); else $aTxt[$x] = substr( $txt, trim($aTxtIndx[$x]) ); } return $aTxt; } /** * Finds the closest word to a string index * @param string $s String to search * @param integer $startIndex Index at which to find the closest word * @return integer */ function FindClosestWordIndex($s, $startIndex) { $wordAfterIndex = 0; $wordBeforeIndex = 0; for ($i = $startIndex; $i < strlen($s); $i++) { if ($s[$i] == ' ') { $wordAfterIndex = $i; break; } } for ($i = $startIndex; $i >= 0; $i--) { if ($s[$i] == ' ') { $wordBeforeIndex = $i; break; } } if ($wordAfterIndex - $startIndex <= $startIndex - $wordBeforeIndex) return $wordAfterIndex; else return $wordBeforeIndex; }
答案 3 :(得分:0)
分区为相同大小的是NP-Complete
答案 4 :(得分:0)
使用python代码
答案 5 :(得分:0)
通常实现自动换行的方法是将尽可能多的单词放在一行上,并在没有空间时将其放到下一行。当然,这假设你有一个最大宽度。
无论您使用何种算法,请记住,除非使用固定宽度的字体,否则您希望使用单词的物理宽度,而不是字母数。
答案 6 :(得分:0)
根据Brian的回答,我制作了他的代码的JavaScript版本:http://jsfiddle.net/gmoz22/CPGY2/。
// Input text
var txt = "This is a really long string that should be broken up onto lines of about the same number of characters.";
// Number of lines
var numLines = 3;
/* Do it, result comes as an array: */
var aResult = splitLinesByClosestWhitespace(txt, numLines);
/* Output result: */
if (aResult)
{
for (var x = 1; x<=numLines; x++)
document.write( "Line "+x+": " + aResult[x] + "<br>" );
} else {
document.write("Not enough spaces to generate the lines!");
}
/**********************/
// Original algorithm by http://stackoverflow.com/questions/2381525/algorithm-split-a-string-into-n-parts-using-whitespaces-so-all-parts-have-nearl/2381772#2381772, rewritten for JavaScript by Steve Oziel
/**
* Trims a string for older browsers
* Used only if trim() if it is not already available on the Prototype-Object
* since overriding it is a huge performance hit (generally recommended when extending Native Objects)
*/
if (!String.prototype.trim)
{
String.prototype.trim = function(){return this.replace(/^\s+|\s+$/g, '');};
}
/**
* Splits a string into multiple lines of the closest possible same length,
* using the closest whitespaces
* @param {string} txt String to split
* @param {integer} numLines Number of lines
* @returns {Array}
*/
function splitLinesByClosestWhitespace(txt, numLines)
{
var p = parseInt(txt.length / numLines);
var aTxtIndx = [];
var aTxt = [];
// Check we have enough whitespaces to generate the number of lines
var wsCount = txt.split(" ").length - 1;
if (wsCount<numLines)
return false;
// Get the indexes
for (var x=1; x<=numLines; x++)
{
aTxtIndx[x] = FindClosestWordIndex(txt, p * (x-1) );
}
// Do the split
for (var x=1; x<=numLines; x++)
{
if (x != numLines)
aTxt[x] = txt.slice(aTxtIndx[x], aTxtIndx[x+1]).trim();
else
aTxt[x] = txt.slice(aTxtIndx[x]).trim();
}
return aTxt;
}
/**
* Finds the closest word to a string index
* @param {string} s String to search
* @param {integer} startIndex Index at which to find the closest word
* @returns {integer}
*/
function FindClosestWordIndex(s, startIndex)
{
var wordAfterIndex = 0;
var wordBeforeIndex = 0;
for (var i = startIndex; i < s.length; i++)
{
if (s[i] == ' ')
{
wordAfterIndex = i;
break;
}
}
for (var i = startIndex; i >= 0; i--)
{
if (s[i] == ' ')
{
wordBeforeIndex = i;
break;
}
}
if (wordAfterIndex - startIndex <= startIndex - wordBeforeIndex)
return wordAfterIndex;
else
return wordBeforeIndex;
}
当所需行数不太接近空白数时,它可以正常工作。 在我给出的示例中,有19个空格,当您要求将其分成17,18或19行时,它会开始出错。 编辑欢迎!