Question

试图解决这个问题- 给定一个数组a仅包含从1到a.length范围内的数字，请找到第一个重复的数字，第二个出现的数字的索引最小。

这是我的解决方法-

function firstDuplicate(a) {
  for (let i = 0; i < a.length; i++) {
    if (a.indexOf(a[i]) !== i) {
      return a[i];
    }
  }

  return -1;
}

问题-接受标准之一是，算法应在4秒内找到第一个重复值，当输入数组很大时，我无法实现这一点。我测试了其中包含100k项的输入数组，而我的算法花了5秒钟以上的时间。有人可以帮助我调整代码，使其在4秒内完成吗？

非常感谢！

Answer 1

您必须遍历该数组并将元素收集到临时对象，该临时对象将数字（元素）作为键，并将某些布尔值作为索引。

在每次迭代中，检查临时对象是否具有该键。

const bigArray = [];


for(let i = 0; i<1000000; i++) {
  bigArray.push(i);
}


for(let i = 0; i<1000000; i++) {
  bigArray.push(parseInt(Math.random()*1000000));
}


const firstDuplicateInArray = array => {
  const temp = {};
  for (let i = 0; i < array.length; i++) {
    if (temp[array[i]] === true) {
      return array[i];
    }
    temp[array[i]] = true;
  }
  return -1;
};

const start = new Date().getTime();
console.log('Time start:', start);

console.log('Found 1st duplicate:', firstDuplicateInArray(bigArray));

const end = new Date().getTime();
console.log('Time end:', end);

console.log('Time taken:', end - start, 'microseconds');

P.S。 Set慢2倍以上（取决于数组大小）：

const bigArray = [];


for(let i = 0; i<1000000; i++) {
  bigArray.push(i);
}


for(let i = 0; i<1000000; i++) {
  bigArray.push(parseInt(Math.random()*1000000));
}


function firstDuplicate(a) {
  const r = new Set();
  for (let e of a) {
    if (r.has(e)) return e;
    else r.add(e);
  }
  return -1;
}

const start = new Date().getTime();
console.log('Time start:', start);

console.log('Found 1st duplicate:', firstDuplicate(bigArray));

const end = new Date().getTime();
console.log('Time end:', end);

console.log('Time taken:', end - start, 'microseconds');

Answer 2

使用Set会导致按键冲突。因为您知道值是有界范围内的整数，所以最快的方法是使用直接索引，这需要O(1)查找时间而不是O(lg n)。虽然，直接实现将需要2*n存储。如果您能够更改输入数组，则可以将其用作工作空间：

// No extra memory version.
// Negate value at index of seen number to store seen-ness.
// Assumes only numbers in the range from 1 to a.length allowed in array `a`.    
function firstDuplicateNew(a) {
  for (let i = 0; i < a.length; i++) {
    v = Math.abs(a[i])
    if (a[v-1] < 0) {
      return a[i];
    }
    a[v-1] = -1*a[v-1];
  }
  return -1;
}

// OP's Proposed faster version using Set.
function firstDuplicateSet(a) {
  r = new Set();
  for (e of a) {
    if (r.has(e)) return e;
    else r.add(e);
  }
  return -1;
}

// Another posted version.
const firstDuplicateInArray = array => {
  const temp = {};
  for (let i = 0; i < array.length; i++) {
    if (temp[array[i]] === true) {
      return array[i];
    }
    temp[array[i]] = true;
  }
  return -1;
};

a = []
l = 5e6
// for(i = 0; i<l;i++){ a.push(Math.floor(Math.random()*l)); }
for(i = 0; i<l;i++){ a[i] = i+1; }
a[l-1] = 7

for(f of [firstDuplicateSet, firstDuplicateInArray, firstDuplicateNew])      {
  then = Date.now()
  i = f(a)
  now = Date.now()
  console.log(f.name ? f.name : '-')
  console.log('Len:', a.length)
  console.log('Value:'+i)
  console.log('Time:', now-then+'ms')
}

似乎比其他版本运行速度快。

Answer 3

如果必须要有快速的处理时间，我认为值得在算法中花一些内存：

只需创建一个反向映射：一个与存储数字范围一样大的数组。然后，浏览输入数组并将对应于每个数字的索引存储在反向映射中。当您发现此号码已被索引时，您就会得到重复的号码。

Answer 4

function firstDuplicate(a) {
  r = new Set();
  for (e of a) {
    if (r.has(e)) return e;
    else r.add(e);
  }
  return -1;
}

这是我解决的方式。

Answer 5

使用字典来存储键/值对，而不是使用O（n）运行时的indexOf，键是数字，而值是索引。可以在O（1）时间访问它，而您只需要通过数组一次即可。如果您的键具有未定义的值，则说明您尚未看到它，否则，找到的具有实际值的第一个键必须是第一个重复项，并且该值是最小索引。

搜索大量数字中的第一个重复元素

5 个答案: