Excel - 在两列中查找类似的地址,并为每个唯一地址分配ID

时间:2014-10-02 17:12:50

标签: python vb.net excel function spreadsheet

我有两列A和B用于一组地址。两列中可能存在类似的地址。我需要删除A列和B列中的重复地址,并为每个唯一地址分配一个ID。然后我需要返回到原始列表(带有重复项)并使用这些ID插入另外两列。原始列的示例: -

Col A               Col B
----------        -----------
address 1a        address 2b
address 2b        address 7g
address 3c        address 1a 
address 4d        address 8h
address 5e        address 6f
address 6f        address 1a
                  address 9i

我需要删除重复项,因此它将是这样的: -

Col A new          Col B new
----------        -----------
address 1a        
address 2b        address 7g
address 3c         
address 4d        address 8h
address 5e        
address 6f        address 9i

然后我需要为每个地址分配ID

Col C unique address      Col D ID  
----------                ---------  
address 1a                P000001
address 2b                P000002
address 3c                P000003
address 4d                P000004
address 5e                P000005
address 6f                P000006
address 7g                P000007
address 8h                P000008
address 9i                P000009

然后返回原始列并将这些ID添加到每个地址,如下所示: -

Col A           Col AID           Col B              Col BID
----------      -----------       -----------        -----------
address 1a      P000001           address 2b         P000002
address 2b      P000002           address 7g         P000007
address 3c      P000003           address 1a         P000001
address 4d      P000004           address 8h         P000008
address 5e      P000005           address 6f         P000006
address 6f      P000006           address 1a         P000001
                                  address 9i         P000009

到目前为止,我所做的是创建一个包含两列所有地址的新列。然后我只使用Remove Duplicates函数删除重复项以获取唯一的地址。我被困在ID部分并返回原始列以分配新ID。

ID需要采用这种格式 - P ######(#是数字)

我有超过100k的地址要通过。我愿意使用Python或VB来实现这一目标。

谢谢你们。非常感谢您的帮助。

2 个答案:

答案 0 :(得分:0)

第1步:

将两列一起复制到一个新列中的列中(可以是新工作表或同一工作表上的空范围。)

第2步:

选择此新的单元格范围,然后从功能区中的Remove Duplicates标签中选择Data。只需选择Continue with the Current Selection,点击Remove Duplicates,然后点击OK

第3步:

除了新的无重复列表之外,还有唯一的ID。

第4步:

在A列和B列之间插入一列。

第5步:

在列B和D中使用类似:=VLOOKUP(A2, "THE RANGE OF THE ADDRESSES AND ID's", 2, FALSE)的公式。

步骤6(可选):

VLOOKUP公式上复制并粘贴值,并删除唯一的地址/ ID表。

答案 1 :(得分:0)

编辑:代码已优化。减少运行时间

这是一个VBA例程,可以做到这一点。在我的计算机上,处理220,000个地址花费了大约6秒钟,其中大约150,000个地址是唯一的。

这假设您的数据在A1中开始,并且包含一个不同的区域。

  • 使用地址和ID
  • 的属性为cAdrKey定义类对象
  • 将数据读入数组
  • 使用Key属性将数组元素作为Class对象添加到集合中,以防止重复
  • 为每个地址非重复地址添加唯一ID
  • 使用四列
  • 设置结果数组
  • 从集合对象中获取每个地址的唯一ID
  • 将结果写入工作表

首先定义一个类(插入/类模块)并重命名模块 cAdrKey

将此代码粘贴到Class模块中:

==================================

Option Explicit
Private pAddr As String
Private pID As String

Public Property Get Addr() As String
    Addr = pAddr
End Property
Public Property Let Addr(Value As String)
    pAddr = Value
End Property

Public Property Get ID() As String
    ID = pID
End Property
Public Property Let ID(Value As String)
    pID = Value
End Property

===================================

然后,在常规模块中粘贴此代码:

====================================

Option Explicit
Sub ProcessAddresses()
    Dim wsSrc As Worksheet, wsRes As Worksheet
    Dim rSrc As Range, rRes As Range
    Dim vSrc As Variant, vRes As Variant
    Dim colAddr As Collection, cAK As cAdrKey
    Dim I As Long, J As Long, K As Long
    Dim O As Object

'Set worksheets for Source and Results
Set wsSrc = Worksheets("Sheet1")
Set wsRes = Worksheets("Sheet1")

'Set first cell for results
Set rRes = Range("F1")

'Get Source data
With wsSrc
    Set rSrc = .Range("A1").CurrentRegion
End With
vSrc = rSrc

'Collect list of Unique Addresses
Set colAddr = New Collection
On Error Resume Next
For I = 1 To UBound(vSrc)
    For J = 1 To UBound(vSrc, 2)
        Set cAK = New cAdrKey
        With cAK
            .Addr = vSrc(I, J)
            .ID = Format(K, "\P000000")
            colAddr.Add cAK, CStr(.Addr)
            If Err.Number = 0 Then K = K + 1
            Err.Clear
        End With
    Next J
Next I
On Error GoTo 0        

'Generate Results
ReDim vRes(1 To UBound(vSrc), 1 To 4)
For I = 1 To UBound(vSrc)
    vRes(I, 1) = vSrc(I, 1)
    vRes(I, 2) = colAddr(vSrc(I, 1)).ID
    vRes(I, 3) = vSrc(I, 2)
    vRes(I, 4) = colAddr(vSrc(I, 2)).ID
Next I

Application.ScreenUpdating = False
Set rRes = rRes.Resize(UBound(vRes), UBound(vRes, 2))
rRes.EntireColumn.Clear
rRes = vRes
rRes.EntireColumn.AutoFit
Application.ScreenUpdating = True

End Sub

====================================