我需要帮助找出一种简单的方法(最好是一个宏)来将在1个单元格中输入的数据(用空格分隔 - 长度不固定)分成它们自己的列。我有大约10万个条目可供使用。
我目前正在使用Left,Right,Find功能的组合来做这件事但它永远带我。
示例数据布局(A列中的所有内容):http://prntscr.com/32l81u
我正在寻找一种方法,根据条目将每一行拆分成列。具有固定长度的唯一信息是每组长度为9个字符的第一个条目。我需要每组中的所有条目都是它们自己的列。每组由---和|
系列分隔非常感谢任何帮助。
谢谢!
答案 0 :(得分:1)
您的示例源数据包含四种不同的格式。您可能认为只有四种格式,但我对此类任务的体验是,您会发现另一种格式从第312行开始,然后是第1543行,依此类推。
您必须为您希望找到的数据编写宏的第一个版本,但必须检查数据是否符合您的期望。如果集合未能符合预期,请修改宏以处理此新格式以及以前的格式并再次尝试宏。在宏可以成功解码整个源工作表之前,可能需要许多版本。
下面的代码包含用于说明它正在做什么的注释,但在这里我将解释我的方法。
代码包含许多形式的语句:Debug.Assert boolean-expression
。这些语句检查我的假设,如果表达式返回False,代码将停止。如果假设是假的,我没有试图继续进行。有可能跳到下一组并继续发现更多错误的假设,但我总是发现最容易解决一个问题。
我将集合的内容作为令牌列表处理,我期望它出现在少量固定序列中。我将一个令牌定义为两个或多个空格与其邻居分开的东西。我有一个例程GetTokenArray
,它将单元格值拆分的复杂性转移到自己的例程中。如果我对令牌的定义有问题,那么这就是需要重写的例程。
请注意:令牌不必与列匹配。如果我对源数据的理解是正确的,那么集合的第一个标记就是:“aaaaaaaaa Doe,John”。这将不得不分为两个。
在示例1和2中,第二行开始MM/DD/YY City, St ZCode
,而在示例3和4中,它开始MM/DD/YY LName, FNam MName
。如果您必须允许非美国地址,则需要一些聪明的代码才能可靠地区分地址和名称。示例3和示例4使用MM/DD/YY City, St ZCode
作为第4行的开头。除非宏可以识别这两种格式之间存在某些固定的差异,否则您会遇到严重问题。
将整个集合作为数组的一个原因是,如果这有助于识别集合的格式,它允许宏容易地向前看。
我希望下面的代码能让你开始。
Option Explicit
Sub CtrlDecode()
Dim InxTA As Long
Dim RowOutCrnt As Long
Dim RowSrcCrnt As Long
Dim RowSrcLast As Long
Dim TokenPart() As String
Dim SrcData As Variant
Dim TokenArray() As String
' It might be possible to place output column values directly into a cell
' or into an array for writing to an output row. However, it seems to me
' that it will be easier to place column values into specific variables
' as the set is processed and then move them when processing has finished.
' No doubt you will want to replace my variable names with something more
' appropriate.
Dim Id1 As String ' Leading nine character identifier
Dim Name As String
Dim Id2 As String ' XX/XXX/XXXXX
Dim Id3 As String ' XXX/xx
' This macro outputs decoded data to worksheet "Output".
' Change as necessary.
With Worksheets("Output")
' Delete all data created by previous run of macro.
.Cells.EntireRow.Delete
' Set column widths
.Columns("A").ColumnWidth = 11#
.Columns("B").ColumnWidth = 25#
.Columns("C").ColumnWidth = 12#
.Columns("D").ColumnWidth = 7.14
End With
' This macro reads source data from worksheet "Source".
' Change as necessary.
With Worksheets("Source")
' Find last row containing data
RowSrcLast = .Cells(Rows.Count, 1).End(xlUp).Row
' Load all data into an array. I believe this will be more convenient
' that accessing the data cell by cell and it will be faster.
SrcData = .Range(.Cells(1, 1), .Cells(RowSrcLast, 1)).Value
End With
RowSrcCrnt = 1
RowOutCrnt = 1
Do While RowSrcCrnt <= RowSrcLast
' Record start row of current set for error message
Call GetTokenArray(SrcData, RowSrcCrnt, TokenArray)
' First token.
' Expect 9-character-id space FName comma space LName
TokenPart = Split(TokenArray(1), " ")
' Expect three parts: first with length of 9, second ending
' in comma.
Debug.Assert UBound(TokenPart) = 2
Debug.Assert Len(TokenPart(0)) = 9
Debug.Assert Right(TokenPart(1), 1) = ","
Id1 = TokenPart(0)
Name = TokenPart(1) & " " & TokenPart(2)
' Second token.
' Expect xx/xxx/xxxxx
TokenPart = Split(TokenArray(2), "/")
Debug.Assert UBound(TokenPart) = 2
Debug.Assert Len(TokenPart(0)) = 2
Debug.Assert Len(TokenPart(1)) = 3
Debug.Assert Len(TokenPart(2)) = 5
Id2 = TokenArray(2)
' Third token.
' Expect xxx/xx
TokenPart = Split(TokenArray(3), "/")
Debug.Assert UBound(TokenPart) = 1
Debug.Assert Len(TokenPart(0)) = 3
Debug.Assert Len(TokenPart(1)) = 2
Id3 = TokenArray(3)
' Continue with remaining tokens. The next few tokens appears to be
' the same in all formats so you could continue accessing TokenArray(4),
' TokenArray(5) and so on as I have. It then looks as though the tokens
' present vary from format to format. You will then have to use a
' variable, such as InxTA, and have code like:
' If TokenArray(InxTA) is optional token xyz Then
' VariableXyz = TokenArray(InxTA)
' InxTA = InxTA + 1
' Else
' VariableXyz = ""
' End If
' There are a variety of ways of transferring values to the output
' worksheet. I suspect a cell by cell write is the easiest although
' not the fastest.
With Worksheets("Output")
.Cells(RowOutCrnt, "A").Value = Id1
.Cells(RowOutCrnt, "B").Value = Name
.Cells(RowOutCrnt, "C").Value = Id2
.Cells(RowOutCrnt, "D").Value = Id3
RowOutCrnt = RowOutCrnt + 1
End With
Loop
End Sub
Sub GetTokenArray(SrcData As Variant, RowSrcCrnt As Long, TokenArray() As String)
' * SrcData(1 To X, 1 To 1) contains all the data from the source worksheet
' * On entry, RowSrcrnt points at the first row of a set. On exit, it points
' at the first row of the next set.
' * On exit, TokenArray will dimensioned as (1 To N) where N is the number of
' tokens found. Each entry will contain one token in the sequence found.
' * A set is ended by a row starting "-----" and containing nothing but
' hyphens and pipes.
' * A token is separated from its neighbour by two or more spaces.
Debug.Assert RowSrcCrnt < UBound(SrcData, 1) ' SrcData already processed
' The upper bound of 500 is intended to be more than could possibly be
' required so as not to bother with enlarginging a small array when it
' becomes full.
ReDim TokenArray(1 To 500)
Dim DataCrnt As String
Dim InxTA As Long
Dim PosCrnt As Long
Dim Pos2Spaces As Long
InxTA = 0
Do While True
DataCrnt = SrcData(RowSrcCrnt, 1)
Debug.Assert DataCrnt <> "" ' Unexpected empty row
' Check for row being end of set
If Left(DataCrnt, 5) = "-----" And _
Replace(Replace(DataCrnt, "-", ""), "|", "") = "" Then
' End of set
Exit Do
End If
' Extract all token from current row
PosCrnt = 1
Do While True
Do While True
' Search for non-space
If PosCrnt > Len(DataCrnt) Then
' End of cell value
Exit Do
End If
If Mid(DataCrnt, PosCrnt, 1) <> " " Then
' Start of next token found
Exit Do
End If
PosCrnt = PosCrnt + 1
Loop ' until end of cell or next token
If PosCrnt > Len(DataCrnt) Then
' This cell finished
RowSrcCrnt = RowSrcCrnt + 1
Debug.Assert RowSrcCrnt < UBound(SrcData) ' Last set not terminated
Exit Do
Else
' Have found first character of next token
Pos2Spaces = InStr(PosCrnt, DataCrnt, " ")
If Pos2Spaces = 0 Then
' Everything up to end of cell is last token of cell
InxTA = InxTA + 1
' Trim to remove trailing single space
TokenArray(InxTA) = Trim(Mid(DataCrnt, PosCrnt))
' This cell finished
RowSrcCrnt = RowSrcCrnt + 1
Debug.Assert RowSrcCrnt <= UBound(SrcData) ' Last set not terminated
Exit Do
Else
' Everything up to 2 spaces is next token of this cell
InxTA = InxTA + 1
TokenArray(InxTA) = Mid(DataCrnt, PosCrnt, Pos2Spaces - PosCrnt)
PosCrnt = Pos2Spaces + 2
End If
End If
Loop ' until end of cell
Loop ' until end of set
Debug.Assert InxTA > 0 ' Empty set
' Discard unused entries
ReDim Preserve TokenArray(1 To InxTA)
RowSrcCrnt = RowSrcCrnt + 1 ' Step over dividing row
End Sub
答案 1 :(得分:0)
尝试一下:
Sub parser()
Dim N As Long, wf As WorksheetFunction
Set wf = Application.WorksheetFunction
N = Cells(Rows.Count, "A").End(xlUp).Row
Dim i As Long, j As Long, k As Long
For i = 1 To N
ary = Split(wf.Trim(Cells(i, "A").Text), " ")
k = 2
For j = LBound(ary) To UBound(ary)
Cells(i, k).Value = ary(j)
k = k + 1
Next j
Next i
End Sub