Question

我正在构建一个应用程序，以识别JSON文件中的重复数据和唯一数据，我想输出唯一记录的数量。

我有一个JSON对象，其中包含很多名字和姓氏。我希望能够识别重复的数据，但是如果名称相似，也应该将数据识别为相同。例如：

 [
   {FirstName: 'Joshua', LastName: 'smith'}
   {FirstName: 'Joshuaa', LastName: 'smith'}
 ]

如您在上方看到的，第二个对象有一个额外的“ a”，但我希望将此对象视为与第一个对象相同的数据。因此，基本上要考虑FirstName和LastName数据中的错字。

我曾考虑过使用正则表达式，但是我不知道该在哪里使用它。

Answer 1

您可以执行以下操作，为相似性设置一个所需的阈值，在此示例中，我将其设置为1：

const array = [
    { FirstName: 'Joshua', LastName: 'smith' },
    { FirstName: 'Joshuaa', LastName: 'smith' }
];

const THRESHOLD = 1;

const compareCollections = (document) => {
    array.forEach(element => {
        let consideredSimilar = false;

        if (element.FirstName === document.FirstName) {
            // typo should be in the lastname
            if (_checkDifferences(element.LastName, document.LastName) <= THRESHOLD) {
                // they can be similar
                console.log('SIMILAR LASTNAME');
                consideredSimilar = true;
            }
        } else if (element.LastName === document.LastName) {
            // typo should be in firstname
            if (_checkDifferences(element.FirstName, document.FirstName) <= THRESHOLD) {
                // they can be similar
                console.log('SIMILAR FIRSTNAME');
                consideredSimilar = true;
            }
        }

        console.log('CONSIDERED SIMILAR: ', consideredSimilar);

    });
}

const _checkDifferences= (first, second) => {
    const splittedFirst = first.split('');
    const splittedSecond = second.split('');

    const length = splittedFirst.length > splittedSecond.length ? splittedFirst.length : splittedSecond.length;

    let differences = 0;

    for (let index = 0; index < length; index++) {
        const elementFirst = splittedFirst[index];
        const elementSecond = splittedSecond[index];

        if (elementFirst != elementSecond) {
            differences++;
        }
    }

    return differences;
}

compareCollections(array[1]);

Answer 2

如果我们谈论重复，让我们澄清一下重复。我可能想出一个人有真实姓名-“约书亚”的情况。就您的问题而言，可能是某种bayesian filter。

对于我来说，只需将数组转换为键为姓（便宜）的对象，然后返回数组即可。

const array = [
    { FirstName: 'Joshua', LastName: 'smith' },
    { FirstName: 'Joshuaa', LastName: 'smith' }
];

const test = array.reduce((acc, el) => ({
    ...acc,
  [el.LastName]: {...el}
}), {});
const output = Object.values(test);

如何考虑JSON对象文件中相似的数据

2 个答案: