Excel:不完美的匹配

时间:2014-10-20 22:31:29

标签: excel excel-vba vba

我有两个表填充了设施名称和邮政编码(以及一堆不相关的其他数据)。这些表来自两个不同的来源,我想将它们联系起来。问题是,设施名称的写法略有不同(缩写,撇号等)。邮政编码恰好是一个很好的限制因素,只能减少几千个潜在的匹配。我的问题是,

如果给定邮政编码匹配和近似名称匹配,我如何匹配两个表:

|facility|zip  |         |facility|zip  |
|azyt    |10000| No Match|aaaa    |10000|
|abba    |10000| Match   |abb'    |10000|

我想到的一种方法是表A中的每一行,梳理表B的邮政编码,找到所有具有匹配邮政编码的行。然后以某种方式测试哪个名称与MOST相似,可能是通过比较最左边的字符(?)。总有比赛,只是不完美。

这是我不完整的代码:

 Dim facilityName, facilityZip, otherName

  For i = 1 To Worksheets("Facility").UsedRange.Rows.Count
    facilityName = Worksheets("Facility").Cells(i, 2)
    facilityZip = Worksheets("Facility").Cells(i, 4)

    'Grab all rows in OTHER that have same ZIP
    For j = 1 To Worksheets("Other").UsedRange.Rows.Count
      otherName = Array()
      ub = UBound(otherName) + 1
      If Worksheets("Other").Cells(j, 3).Value = facilityZip Then
        ReDim Preserve otherName(0 To ub)
        otherName(ub) = j
      End If
    Next j

    'Compare names
     For Each rw In otherName
       'here I would compare each result to the current facilityName. There's likely a better way to do this...
     Next rw

    Next i

对此代码的帮助,建议甚至是不同的方法都会非常有用!

谢谢!

1 个答案:

答案 0 :(得分:0)

使用here中的Levenshtein距离法:

Dim facilityName, facilityZip, hudName, hudZip, provNum, lowerLen, levNum, tmpLev, resultHudNum, resultHudName


For i = 1 To Worksheets("Facility").UsedRange.Rows.Count
  With Worksheets("Facility")
    facilityName = .Cells(i, 2)
    facilityZip = .Cells(i, 3)
  End With
  levNum = 5
  resultHudNum = 0
  For j = 1 To Worksheets("Sheet1").UsedRange.Rows.Count
    With Worksheets("Sheet1")
      hudName = .Cells(j, 2)
      hudZip = .Cells(j, 4)
      hudNum = .Cells(j, 1)
    End With
    If hudZip = facilityZip Then
      If Len(facilityName) < Len(hudName) Then
        lowerLen = Len(facilityName)
      Else
        lowerLen = Len(hudName)
      End If
      tmpLev = Levenshtein(Mid(facilityName, 1, lowerLen), Mid(hudName, 1, lowerLen))
      If tmpLev < levNum Then
          levNum = tmpLev
          resultHudNum = hudNum
          resultHudName = hudName
      End If
    End If
  Next j
  If resultHudNum <> 0 Then
    Worksheets("Facility").Cells(i, 4).Value = resultHudNum
    Worksheets("Facility").Cells(i, 5).Value = resultHudName
  Else
    Worksheets("Facility").Cells(i, 4).Value = "000000"
    Worksheets("Facility").Cells(i, 5).Value = ""
  End If
Next i

可以通过更改levNum中的值来修改stings之间所需的相似度。

谢谢@Tim的帮助!