Question

我在 javascript 中有一个很长的字符串，需要由 javascript 中的许多函数处理。

调用字符串：

 var str;

后续函数从上一个函数停止的地方开始。所以我保留一个变量 strPos 来指示我在字符串中的位置。

每个函数返回沿字符串的新位置，即

function MyStringFunction(str, strPos){
    /* Does some fantastic work on the str without changing it */

    /* Say this function move strPos on 10 characters so we return */
    return (10 + strPos);
}

这是最佳最快的做事方式吗？

我应该减少字符串吗？

// NOW RETURNS THE SHORTENED STRING MINUS THE STUFF I HAVE NOW WORKED ON
// strPos is now always the start of the string, as that is now where I left off
function MyStringFunction(str){
    /* Does some fantastic work on the str without changing it */

    /* Say this function works on 10 characters so we return */
    return str.substr(10);
}

最快的方法是什么？请注意，字符串开始时大约有 2 万个字符。

Answer 1

当然，性能将取决于 js 引擎实现和它引入的可能优化。但理论上，通过基于索引的方法进行字符串遍历的性能会更高。重点是 JS 中的字符串是一个基于索引的无符号 16 位整数的不可变列表。 由此我得出两个简单的结论：

基于索引的部分意味着它保证我们有 O(1) 访问字符串的任何元素时的复杂性；
不可变部分意味着即使有人设法将所有计算流程构建为链仍然需要浪费一些计算的函数调用在每次调用期间创建子串的时间。所以这对我来说没有多大意义。

在这里我做了一些非常基本的基准测试。当然，与严肃的测试相比，这没什么，但仍然可以做到：

const INIT = {
  str: 'x'.repeat(10e6),
  i: 0,
  startTime: 0,
};

let t = {};

const setup = () => {
  t = { ...INIT, startTime: Date.now() };
}

const fooIndexApproach = (str, idx) => {
  const upto = idx + 10;
  for (let i = idx; i < upto; i++) str.charCodeAt(i);
  return upto;
}
const fooSubStrApproach = (str) => {
  for (let i = 0; i < 10; i++) str.charCodeAt(i);
  return str.slice(10);
}


// the first run-by is not taken into account as 'warmed-up' engine optimizations
// may affect the performance 
setup();
for (let j = 0; j < 10e3; j++);
while ((t.i = fooIndexApproach(t.str, t.i)) < t.str.length) { };

setup();
for (let j = 0; j < 10e3; j++);
while ((t.i = fooIndexApproach(t.str, t.i)) < t.str.length) { };
console.log(`fooIndexApproach: ${(Date.now() - t.startTime)}ms`);


// the first run-by is not taken into account as 'warmed-up' engine optimizations
// may affect the performance 
setup();
for (let j = 0; j < 10e3; j++);
while (t.str = fooSubStrApproach(t.str)) { };

setup();
for (let j = 0; j < 10e3; j++);
while (t.str = fooSubStrApproach(t.str)) { };
console.log(`fooSubStrApproach: ${(Date.now() - t.startTime)}ms`);

// ==============================================
// RESULTS on my Mac:
// substring VS indexed => [[51, 19], [51, 20]]ms
// indexed VS substring => [[31, 52], [32, 52]]ms

我必须说，相当不言自明的数字。索引方法胜出，但这并不奇怪。

优化处理长字符串的函数

1 个答案: