快速计算和删除阵列中重复项的方法

时间:2019-02-23 08:29:14

标签: javascript arrays sorting duplicates

我有一个包含重复项的数组

array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]

我想摆脱重复项(不区分大小写),并创建一个计算重复项的新数组。

在其中一个答案中,我看到了此功能:

function count_array(arr) {
    var a = [], b = [], prev;

    arr.sort();
    for ( var i = 0; i < arr.length; i++ ) {
        if ( arr[i] !== prev ) {
             a.push(arr[i]);
             b.push(1);
        } else {
             b[b.length-1]++;
        }
        prev = arr[i];
     }
     return [a, b];
 }

返回两个数组:

First array: ["String 1", "String 2", "STRING 1", "String 3"]
Second array: [2, 2, 1, 1]

它不区分大小写,我希望将String 1, STRING 1, string 1, StRING 1的所有实例都视为String 1

对于大型阵列,还有更好的方法吗?例如10K的数组长度?

5 个答案:

答案 0 :(得分:2)

.sort()是一个O(N log N)流程-如果您需要 对结果进行排序,如果您担心速度,那请在最后进行。如果您不需要对结果进行排序,则可以使用Set(或Map)来检查重复项,而不是检查类似项的排序数组在相邻的索引中。

array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
function count_array(arr) {
  const result = [];
  const map = new Map();
  arr.forEach((str) => {
    const lower = str.toLowerCase();
    const currCount = map.get(lower) || 0;
    if (!currCount) {
      result.push(str);
    }
    map.set(lower, currCount + 1);
  });
  console.log([...map.values()]);
  return result.sort();
}
console.log(count_array(array));

如果需要,您可以使用for循环而不是forEachfor循环会稍快一些,尽管更难于阅读IMO:

array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
function count_array(arr) {
  const result = [];
  const map = new Map();
  for (let i = 0, { length } = arr; i < length; i++) {
    const str = arr[i];
    const lower = str.toLowerCase();
    const currCount = map.get(lower) || 0;
    if (!currCount) {
      result.push(str);
    }
    map.set(lower, currCount + 1);
  }
  console.log([...map.values()]);
  return result.sort();
}
console.log(count_array(array));

答案 1 :(得分:2)

使用字符串作为键,将出现次数作为值,将字符串数组简化为一个对象。使用Object.keys()获取第一个数组,然后使用Object.values()获取第二个数组:

const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]

const counts = array.reduce((r, s) => {
  const key = s[0].toUpperCase() + s.substring(1).toLowerCase();
  
  r[key] = (r[key] || 0) + 1;
  
  return r;
}, {});

const first = Object.keys(counts);
const second = Object.values(counts);

console.log(first);
console.log(second);

要获得按重复次数排序的结果,请使用Object.entries()将reduce的结果转换为成对的数组。按第二个值(计数)排序。要获取两个数组,请使用Array.map()

const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]

const counts = Object.entries(array.reduce((r, s) => {
  const key = s[0].toUpperCase() + s.substring(1).toLowerCase();
  
  r[key] = (r[key] || 0) + 1;
  
  return r;
}, {}))
.sort(([, a], [, b]) => b - a);

const first = counts.map(([s]) => s);
const second = counts.map(([, n]) => n);

console.log(first);
console.log(second);

答案 2 :(得分:1)

您可以使用一些函数并通过对它们进行计数来过滤法化值。

const
    normalize = s => s.toLowerCase(),
    getFirst = a => a,
    mapCount = (m, k) => m.set(k, (m.get(k) || 0) + 1),
    array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"],
    map = new Map,
    array1 = array.filter(v => (k => getFirst(!map.has(k), mapCount(map, k)))(normalize(v))),
    array2 = Array.from(map.values());

console.log(array1);
console.log(array2);

如果对标准化字符串作为结果集感到满意,则可以采用这种方法。

const
    normalize = s => s.toLowerCase(),
    mapCount = (m, k) => m.set(k, (m.get(k) || 0) + 1),
    array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"],
    map = array.reduce((m, v) => mapCount(m, normalize(v)), new Map),
    array1 = Array.from(map.keys()),
    array2 = Array.from(map.values());

console.log(array1);
console.log(array2);

答案 3 :(得分:0)

如果您询问执行此操作的最快方法,则应在Big-O(N)中渐近地进行:

  1. 首先,您需要一个哈希映射来存储所有过去的字符串;
  2. 第二,您需要遍历给定数组,将其值放入哈希图中;
  3. 最后,您每次需要满足哈希映射中的字符串计数时就需要增加计数。

它可以这样实现:

const arr = [...];
const map = {};

for (let i = 0; i <= arr.length - 1; i++) {
    const str = arr[i].toLowerCase();

    if (str in map) {
        map[str]++;

        // keep in mind that removing element from an array costs O(N)
        arr[i] = undefined;
    } else {
        map[str] = 1;
    }
}

// now you have the hash map that represents all strings and its numbers of appearances in the given array
doSomething(map);

// finally return filtered result
return arr.filter(str => str !== undefined);

答案 4 :(得分:0)

可以使用Object.keys()简洁地完成此操作,以创建一个映射,该映射的键是数组的小写字母,值是它们的计数。然后使用Object.values()获取唯一项,并使用const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]; const map = array.reduce((acc, x) => { const xLower = x.toLocaleLowerCase(); acc[xLower] = (acc[xLower] || 0) + 1; return acc; }, {}); console.log(map); console.log(Object.keys(map)); console.log(Object.values(map));获取计数:

{{1}}