我有两个表填充了设施名称和邮政编码(以及一堆不相关的其他数据)。这些表来自两个不同的来源,我想将它们联系起来。问题是,设施名称的写法略有不同(缩写,撇号等)。邮政编码恰好是一个很好的限制因素,只能减少几千个潜在的匹配。我的问题是,
如果给定邮政编码匹配和近似名称匹配,我如何匹配两个表:
|facility|zip | |facility|zip |
|azyt |10000| No Match|aaaa |10000|
|abba |10000| Match |abb' |10000|
我想到的一种方法是表A中的每一行,梳理表B的邮政编码,找到所有具有匹配邮政编码的行。然后以某种方式测试哪个名称与MOST相似,可能是通过比较最左边的字符(?)。总有比赛,只是不完美。
这是我不完整的代码:
Dim facilityName, facilityZip, otherName
For i = 1 To Worksheets("Facility").UsedRange.Rows.Count
facilityName = Worksheets("Facility").Cells(i, 2)
facilityZip = Worksheets("Facility").Cells(i, 4)
'Grab all rows in OTHER that have same ZIP
For j = 1 To Worksheets("Other").UsedRange.Rows.Count
otherName = Array()
ub = UBound(otherName) + 1
If Worksheets("Other").Cells(j, 3).Value = facilityZip Then
ReDim Preserve otherName(0 To ub)
otherName(ub) = j
End If
Next j
'Compare names
For Each rw In otherName
'here I would compare each result to the current facilityName. There's likely a better way to do this...
Next rw
Next i
对此代码的帮助,建议甚至是不同的方法都会非常有用!
谢谢!
答案 0 :(得分:0)
使用here中的Levenshtein距离法:
Dim facilityName, facilityZip, hudName, hudZip, provNum, lowerLen, levNum, tmpLev, resultHudNum, resultHudName
For i = 1 To Worksheets("Facility").UsedRange.Rows.Count
With Worksheets("Facility")
facilityName = .Cells(i, 2)
facilityZip = .Cells(i, 3)
End With
levNum = 5
resultHudNum = 0
For j = 1 To Worksheets("Sheet1").UsedRange.Rows.Count
With Worksheets("Sheet1")
hudName = .Cells(j, 2)
hudZip = .Cells(j, 4)
hudNum = .Cells(j, 1)
End With
If hudZip = facilityZip Then
If Len(facilityName) < Len(hudName) Then
lowerLen = Len(facilityName)
Else
lowerLen = Len(hudName)
End If
tmpLev = Levenshtein(Mid(facilityName, 1, lowerLen), Mid(hudName, 1, lowerLen))
If tmpLev < levNum Then
levNum = tmpLev
resultHudNum = hudNum
resultHudName = hudName
End If
End If
Next j
If resultHudNum <> 0 Then
Worksheets("Facility").Cells(i, 4).Value = resultHudNum
Worksheets("Facility").Cells(i, 5).Value = resultHudName
Else
Worksheets("Facility").Cells(i, 4).Value = "000000"
Worksheets("Facility").Cells(i, 5).Value = ""
End If
Next i
可以通过更改levNum中的值来修改stings之间所需的相似度。
谢谢@Tim的帮助!