如何将字符串拆分为给定的行数?

时间:2016-08-22 15:51:27

标签: javascript algorithm line-breaks word-wrap

以下是我的问题:

给定一个由空格分隔的单词组成的字符串,如何将其分成N个(大致)偶数长度的字符串,只打破空格?

以下是我从研究中收集到的内容:

我开始研究自动换行算法,因为在我看来,这基本上是一个自动换行的问题。然而,到目前为止我发现的大部分内容(并且有关于自动换行的很多内容)假设线的宽度是已知输入,并且线的数量是输出。我想要相反的。

我发现了(非常)几个问题,例如this似乎有帮助。然而,他们都把重点放在优化问题上 - 例如如何将句子分成给定数量的行,同时最小化行的粗糙度,浪费的空白或其他任何行,并以线性(或NlogN或其他)时间进行。这些问题似乎大多没有答案,因为问题的优化部分相对而言很难......#/ p>

但是,我并不关心优化。只要线条(在大多数情况下)大致均匀,如果解决方案在每个边缘情况下都不起作用,或者无法证明时间复杂度最低,我就可以了。我只需要一个真正的世界解决方案,它可以接受一个字符串和一些行(大于2),然后给我一个通常看起来很均匀的字符串数组。

以下是我提出的建议: 当N = 3时,我认为我有一个可行的方法。我首先将第一个单词放在第一行,最后一行放在最后一行,然后迭代地在第一行和最后一行放置另一个单词,直到我的总宽度(通过最长行的长度测量)停止变短。这通常有效,但是如果最长的单词位于行的中间,它会被绊倒,并且它看起来不会超过3行。

var getLongestHeaderLine = function(headerText) {
  //Utility function definitions
  var getLongest = function(arrayOfArrays) {
    return arrayOfArrays.reduce(function(a, b) {
      return a.length > b.length ? a : b;
    });
  };

  var sumOfLengths = function(arrayOfArrays) {
    return arrayOfArrays.reduce(function(a, b) {
      return a + b.length + 1;
    }, 0);
  };

  var getLongestLine = function(lines) {
    return lines.reduce(function(a, b) {
      return sumOfLengths(a) > sumOfLengths(b) ? a : b;
    });
  };

  var getHeaderLength = function(lines) {
    return sumOfLengths(getLongestLine(lines));
  }

  //first, deal with the degenerate cases
  if (!headerText)
    return headerText;

  headerText = headerText.trim();

  var headerWords = headerText.split(" ");

  if (headerWords.length === 1)
    return headerText;

  if (headerWords.length === 2)
    return getLongest(headerWords);

  //If we have more than 2 words in the header,
  //we need to split them into 3 lines
  var firstLine = headerWords.splice(0, 1);
  var lastLine = headerWords.splice(-1, 1);
  var lines = [firstLine, headerWords, lastLine];

  //The header length is the length of the longest
  //line in the header. We will keep iterating
  //until the header length stops getting shorter.
  var headerLength = getHeaderLength(lines);
  var lastHeaderLength = headerLength;
  while (true) {
    //Take the first word from the middle line,
    //and add it to the first line
    firstLine.push(headerWords.shift());
    headerLength = getHeaderLength(lines);
    if (headerLength > lastHeaderLength || headerWords.length === 0) {
      //If we stopped getting shorter, undo
      headerWords.unshift(firstLine.pop());
      break;
    }
    //Take the last word from the middle line,
    //and add it to the last line
    lastHeaderLength = headerLength;
    lastLine.unshift(headerWords.pop());
    headerLength = getHeaderLength(lines);
    if (headerLength > lastHeaderLength || headerWords.length === 0) {
      //If we stopped getting shorter, undo
      headerWords.push(lastLine.shift());
      break;
    }
    lastHeaderLength = headerLength;
  }

  return getLongestLine(lines).join(" ");
};

debugger;
var header = "an apple a day keeps the doctor away";

var longestHeaderLine = getLongestHeaderLine(header);
debugger;
编辑:我标记了javascript,因为最终我想要一个可以用该语言实现的解决方案。但是,这对问题并不是非常关键,我会采取任何有效的解决方案。

编辑#2:虽然性能不是我最关心的,但我确实需要能够执行任何我提出的解决方案~100-200次,在字符串上可以达到〜 250个字符长。这将在页面加载期间完成,因此它不需要永远。例如,我发现尝试通过将每个字符串放入DIV并使用维度进行播放来将此问题卸载到渲染引擎,因为它(似乎)测量渲染的成本非常高元件。

4 个答案:

答案 0 :(得分:2)

试试这个。对于任何合理的N,它应该做的工作:

function format(srcString, lines) {
  var target = "";
  var  arr =  srcString.split(" ");
  var c = 0;
  var MAX = Math.ceil(srcString.length / lines);
  for (var i = 0, len = arr.length; i < len; i++) {
     var cur = arr[i];
     if(c + cur.length > MAX) {
        target += '\n' + cur;
     c = cur.length;
     }
     else {
       if(target.length > 0)
         target += " ";
       target += cur;
       c += cur.length;
     }       
   }
  return target;
}

alert(format("this is a very very very very " +
             "long and convoluted way of creating " +
             "a very very very long string",7));

答案 1 :(得分:1)

您可能希望尝试使用canvas进行此解决方案。它需要优化,只是一个快速的镜头,但我认为画布可能是一个好主意,因为你可以计算实际宽度。您还可以将字体调整为真正使用的字体,依此类推。重要的是要注意:这不是最有效的做事方式。它会创造出很多画布。

DEMO

var t = `However, I don't care that much about optimization. As long as the lines are (in most cases) roughly even, I'm fine if the solution doesn't work in every single edge case, or can't be proven to be the least time complexity. I just need a real world solution that can take a string, and a number of lines (greater than 2), and give me back an array of strings that will usually look pretty even.`;


function getTextTotalWidth(text) {
    var canvas = document.createElement("canvas");
    var ctx = canvas.getContext("2d");
  ctx.font = "12px Arial";
    ctx.fillText(text,0,12);
  return ctx.measureText(text).width;
}

function getLineWidth(lines, totalWidth) {
    return totalWidth / lines ;
}

function getAverageLetterSize(text) {
    var t = text.replace(/\s/g, "").split("");
  var sum = t.map(function(d) { 
    return getTextTotalWidth(d); 
  }).reduce(function(a, b) { return a + b; });
    return  sum / t.length;
}

function getLines(text, numberOfLines) {
    var lineWidth = getLineWidth(numberOfLines, getTextTotalWidth(text));
  var letterWidth = getAverageLetterSize(text);
  var t = text.split("");
  return createLines(t, letterWidth, lineWidth);
}

function createLines(t, letterWidth, lineWidth) {
    var i = 0;
  var res = t.map(function(d) {
    if (i < lineWidth || d != " ") {
        i+=letterWidth;
        return d;
    }
    i = 0;
    return "<br />";
  })
  return res.join("");
}

var div = document.createElement("div");
div.innerHTML = getLines(t, 7);
document.body.appendChild(div);

答案 2 :(得分:0)

(改编自此处,How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?

如果我们将单词长度视为数字列表,我们可以二进制搜索分区。

我们的max length范围从0sum (word-length list) + (num words - 1), meaning the spacesmid = (range / 2)。我们通过在mid时间内划分为N集来检查是否可以实现O(m):遍历列表,将(word_length + 1)添加到当前部分,同时当前总和小于或等于mid。当总和通过mid时,开始一个新的部分。如果结果包含N个或更少的部分,则mid是可以实现的。

如果mid可以实现,请尝试较低的范围;否则,更高的范围。时间复杂度为O(m log num_chars)。 (你还必须考虑如何删除每个部分的空格,意味着换行的位置,计算中的特征。)

JavaScript代码(改编自http://articles.leetcode.com/the-painters-partition-problem-part-ii):

function getK(arr,maxLength) {
  var total = 0,
      k = 1;

  for (var i=0; i<arr.length; i++) {
    total += arr[i] + 1;

    if (total > maxLength) {
      total = arr[i];
      k++;
    }
  }

  return k;
}
 

function partition(arr,n) {
  var lo = Math.max(...arr),
      hi = arr.reduce((a,b) => a + b); 

  while (lo < hi) {
    var mid = lo + ((hi - lo) >> 1);

    var k = getK(arr,mid);

    if (k <= n){
      hi = mid;

    } else{
      lo = mid + 1;
    }
  }

  return lo;
}

var s = "this is a very very very very "
      + "long and convoluted way of creating "
      + "a very very very long string",
    n = 7;

var words = s.split(/\s+/),
    maxLength = partition(words.map(x => x.length),7);

console.log('max sentence length: ' + maxLength);
console.log(words.length + ' words');
console.log(n + ' lines')
console.log('')

var i = 0;

for (var j=0; j<n; j++){
  var str = '';
  
  while (true){
    if (!words[i] || str.length + words[i].length > maxLength){
      break
    }
    str += words[i++] + ' ';
  }
  console.log(str);
}

答案 3 :(得分:0)

对不起,这是C#。当您使用Javascript标签更新帖子时,我已经创建了我的项目。

既然你说你所关心的是大致相同的线路长度......我想出了这个。对于简单的方法感到抱歉。

    private void DoIt() {

        List<string> listofwords = txtbx_Input.Text.Split(' ').ToList();
        int totalcharcount = 0;
        int neededLineCount = int.Parse(txtbx_LineCount.Text);

        foreach (string word in listofwords)
        {
            totalcharcount = totalcharcount + word.Count(char.IsLetter);
        }

        int averagecharcountneededperline = totalcharcount / neededLineCount;
        List<string> output = new List<string>();
        int positionsneeded = 0;

        while (output.Count < neededLineCount)
        {
            string tempstr = string.Empty;
            while (positionsneeded < listofwords.Count)
            {
                tempstr += " " + listofwords[positionsneeded];
                if ((positionsneeded != listofwords.Count - 1) && (tempstr.Count(char.IsLetter) + listofwords[positionsneeded + 1].Count(char.IsLetter) > averagecharcountneededperline))//if (this is not the last word) and (we are going to bust the average)
                {
                    if (output.Count + 1 == neededLineCount)//if we are writting the last line
                    {
                        //who cares about exceeding.
                    }
                    else
                    {
                        //we're going to exceed the allowed average, gotta force this loop to stop
                        positionsneeded++;//dont forget!
                        break;
                    }
                }
                positionsneeded++;//increment the needed position by one
            }

            output.Add(tempstr);//store the string in our list of string to output
        }

        //display the line on the screen
        foreach (string lineoftext in output)
        {
            txtbx_Output.AppendText(lineoftext + Environment.NewLine);
        }

    }