处理一个涉及5个类似SQL数据库的项目,我需要检测并过滤掉重复项。我想我走在正确的轨道上,但我还没到那里。我试图按照以下步骤来实现这一目标:
.forEach()
对象的主数组启动item
。let filtered = Array.filter(x => x.id !== item.id);
创建一个过滤后的数组,以防止自我检查。.forEach()
作为参数的已过滤数组启动comparison
。nameSimilarity
,phoneSimilarity
和emailSimilarity
)item.email
和comparison.email
不为空,请比较字符串并将相似度百分比存储在emailSimilarity
其他emailSimilarity=0
中。item.phone
和comparison.phone
不为空,请比较字符串并将相似度百分比存储在phoneSimilarity
其他phoneSimilarity=0
中。item.firstName
和item.lastName
合并到名为itemFullName
的变量中,并将comparison.firstName
和comparison.lastName
合并到名为comparisonFullName
的变量中。 itemFullName
和comparisonFullName
不为空,请比较字符串并将相似度百分比存储在nameSimilarity
其他nameSimilarity=0
中。如果emailSimilarity
,nameSimilarity
或phoneSimilarity
中的任何百分比,将item
加上相似变量和comparison.id
添加到重复项中数组,并将其拼接出原始数组。
这是我为遵循这些步骤编写的代码,但似乎我在duplicates数组中获得了重复的条目。我不确定为什么它没有按预期工作,但我有一种预感,我不能指望原始数组在forEach()
操作中发生变异。
fullArray.forEach(item => {
let filtered = fullArray.filter(x => x.externalId !== item.externalId);
filtered.forEach(comparison => {
let emailSimilarity, phoneSimilarity, nameSimilarity;
if ((item.email !== '') && (comparison.email !== '')) {
emailSimilarity = strcmp.jaro(item.email, comparison.email);
} else {
emailSimilarity = 0;
}
if ((item.phone !== '') && (comparison.phone !== '')) {
phoneSimilarity = strcmp.jaro(item.phone, comparison.phone);
} else {
phoneSimilarity = 0;
}
let itemFullName = `${item.firstName} ${item.LastName}`.trim() || '';
let comparisonFullName = `${comparison.firstName} ${comparison.LastName}`.trim();
if (((itemFullName !== '') && (comparisonFullName !== '')) || ((itemFullName.indexOf('Group')! > 0) && (comparisonFullName.indexOf('Group') !>0))) {
nameSimilarity = strcmp.jaro(itemFullName, comparisonFullName);
} else {
nameSimilarity = 0;
}
if ((emailSimilarity || phoneSimilarity || nameSimilarity) > 0.89) {
let dupesOutput = Object.assign({}, item, { similarName: nameSimilarity, similarEmail: emailSimilarity, similarPhone: phoneSimilarity, similarTo: comparison.externalId });
dupes.push(dupesOutput);
fullArray = fullArray.filter(x => x.externalId !== item.externalId);
}
});
});
问题出在哪里?
答案 0 :(得分:2)
假设相似性检查有效,问题是您在将新数组重新分配给fullArray
时仍处于旧数组的forEach
循环中。
我建议你使用Array.filter
:
var filteredArray = fullArray.filter(item => {
return !fullArray.some(comparison => {
if(comparison.externalId==item.externalId)
return false;
let emailSimilarity, phoneSimilarity, nameSimilarity;
if ((item.email !== '') && (comparison.email !== '')) {
emailSimilarity = strcmp.jaro(item.email, comparison.email);
} else {
emailSimilarity = 0;
}
if ((item.phone !== '') && (comparison.phone !== '')) {
phoneSimilarity = strcmp.jaro(item.phone, comparison.phone);
} else {
phoneSimilarity = 0;
}
let itemFullName = `${item.firstName} ${item.LastName}`.trim() || '';
let comparisonFullName = `${comparison.firstName} ${comparison.LastName}`.trim();
if (((itemFullName !== '') && (comparisonFullName !== '')) || ((itemFullName.indexOf('Group')! > 0) && (comparisonFullName.indexOf('Group') !>0))) {
nameSimilarity = strcmp.jaro(itemFullName, comparisonFullName);
} else {
nameSimilarity = 0;
}
if ((emailSimilarity || phoneSimilarity || nameSimilarity) > 0.89) {
let dupesOutput = Object.assign({}, item, { similarName: nameSimilarity, similarEmail: emailSimilarity, similarPhone: phoneSimilarity, similarTo: comparison.externalId });
dupes.push(dupesOutput);
return true;
}else
return false;
});
});