我有两列A和B用于一组地址。两列中可能存在类似的地址。我需要删除A列和B列中的重复地址,并为每个唯一地址分配一个ID。然后我需要返回到原始列表(带有重复项)并使用这些ID插入另外两列。原始列的示例: -
Col A Col B
---------- -----------
address 1a address 2b
address 2b address 7g
address 3c address 1a
address 4d address 8h
address 5e address 6f
address 6f address 1a
address 9i
我需要删除重复项,因此它将是这样的: -
Col A new Col B new
---------- -----------
address 1a
address 2b address 7g
address 3c
address 4d address 8h
address 5e
address 6f address 9i
然后我需要为每个地址分配ID
Col C unique address Col D ID
---------- ---------
address 1a P000001
address 2b P000002
address 3c P000003
address 4d P000004
address 5e P000005
address 6f P000006
address 7g P000007
address 8h P000008
address 9i P000009
然后返回原始列并将这些ID添加到每个地址,如下所示: -
Col A Col AID Col B Col BID
---------- ----------- ----------- -----------
address 1a P000001 address 2b P000002
address 2b P000002 address 7g P000007
address 3c P000003 address 1a P000001
address 4d P000004 address 8h P000008
address 5e P000005 address 6f P000006
address 6f P000006 address 1a P000001
address 9i P000009
到目前为止,我所做的是创建一个包含两列所有地址的新列。然后我只使用Remove Duplicates函数删除重复项以获取唯一的地址。我被困在ID部分并返回原始列以分配新ID。
ID需要采用这种格式 - P ######(#是数字)
我有超过100k的地址要通过。我愿意使用Python或VB来实现这一目标。
谢谢你们。非常感谢您的帮助。
答案 0 :(得分:0)
将两列一起复制到一个新列中的列中(可以是新工作表或同一工作表上的空范围。)
选择此新的单元格范围,然后从功能区中的Remove Duplicates
标签中选择Data
。只需选择Continue with the Current Selection
,点击Remove Duplicates
,然后点击OK
。
除了新的无重复列表之外,还有唯一的ID。
在A列和B列之间插入一列。
在列B和D中使用类似:=VLOOKUP(A2, "THE RANGE OF THE ADDRESSES AND ID's", 2, FALSE)
的公式。
在VLOOKUP
公式上复制并粘贴值,并删除唯一的地址/ ID表。
答案 1 :(得分:0)
编辑:代码已优化。减少运行时间
这是一个VBA例程,可以做到这一点。在我的计算机上,处理220,000个地址花费了大约6秒钟,其中大约150,000个地址是唯一的。
这假设您的数据在A1中开始,并且包含一个不同的区域。
首先定义一个类(插入/类模块)并重命名模块 cAdrKey
将此代码粘贴到Class模块中:
==================================
Option Explicit
Private pAddr As String
Private pID As String
Public Property Get Addr() As String
Addr = pAddr
End Property
Public Property Let Addr(Value As String)
pAddr = Value
End Property
Public Property Get ID() As String
ID = pID
End Property
Public Property Let ID(Value As String)
pID = Value
End Property
===================================
然后,在常规模块中粘贴此代码:
====================================
Option Explicit
Sub ProcessAddresses()
Dim wsSrc As Worksheet, wsRes As Worksheet
Dim rSrc As Range, rRes As Range
Dim vSrc As Variant, vRes As Variant
Dim colAddr As Collection, cAK As cAdrKey
Dim I As Long, J As Long, K As Long
Dim O As Object
'Set worksheets for Source and Results
Set wsSrc = Worksheets("Sheet1")
Set wsRes = Worksheets("Sheet1")
'Set first cell for results
Set rRes = Range("F1")
'Get Source data
With wsSrc
Set rSrc = .Range("A1").CurrentRegion
End With
vSrc = rSrc
'Collect list of Unique Addresses
Set colAddr = New Collection
On Error Resume Next
For I = 1 To UBound(vSrc)
For J = 1 To UBound(vSrc, 2)
Set cAK = New cAdrKey
With cAK
.Addr = vSrc(I, J)
.ID = Format(K, "\P000000")
colAddr.Add cAK, CStr(.Addr)
If Err.Number = 0 Then K = K + 1
Err.Clear
End With
Next J
Next I
On Error GoTo 0
'Generate Results
ReDim vRes(1 To UBound(vSrc), 1 To 4)
For I = 1 To UBound(vSrc)
vRes(I, 1) = vSrc(I, 1)
vRes(I, 2) = colAddr(vSrc(I, 1)).ID
vRes(I, 3) = vSrc(I, 2)
vRes(I, 4) = colAddr(vSrc(I, 2)).ID
Next I
Application.ScreenUpdating = False
Set rRes = rRes.Resize(UBound(vRes), UBound(vRes, 2))
rRes.EntireColumn.Clear
rRes = vRes
rRes.EntireColumn.AutoFit
Application.ScreenUpdating = True
End Sub
====================================