宏将文本字符串拆分为自己的列 - 没有固定长度/分隔符

时间:2014-04-06 10:47:46

标签: excel excel-vba vba

我需要帮助找出一种简单的方法(最好是一个宏)来将在1个单元格中输入的数据(用空格分隔 - 长度不固定)分成它们自己的列。我有大约10万个条目可供使用。

我目前正在使用Left,Right,Find功能的组合来做这件事但它永远带我。

示例数据布局(A列中的所有内容):http://prntscr.com/32l81u

我正在寻找一种方法,根据条目将每一行拆分成列。具有固定长度的唯一信息是每组长度为9个字符的第一个条目。我需要每组中的所有条目都是它们自己的列。每组由---和|

系列分隔

非常感谢任何帮助。

谢谢!

2 个答案:

答案 0 :(得分:1)

您的示例源数据包含四种不同的格式。您可能认为只有四种格式,但我对此类任务的体验是,您会发现另一种格式从第312行开始,然后是第1543行,依此类推。

您必须为您希望找到的数据编写宏的第一个版本,但必须检查数据是否符合您的期望。如果集合未能符合预期,请修改宏以处理此新格式以及以前的格式并再次尝试宏。在宏可以成功解码整个源工作表之前,可能需要许多版本。

下面的代码包含用于说明它正在做什么的注释,但在这里我将解释我的方法。

代码包含许多形式的语句:Debug.Assert boolean-expression。这些语句检查我的假设,如果表达式返回False,代码将停止。如果假设是假的,我没有试图继续进行。有可能跳到下一组并继续发现更多错误的假设,但我总是发现最容易解决一个问题。

我将集合的内容作为令牌列表处理,我期望它出现在少量固定序列中。我将一个令牌定义为两个或多个空格与其邻居分开的东西。我有一个例程GetTokenArray,它将单元格值拆分的复杂性转移到自己的例程中。如果我对令牌的定义有问题,那么这就是需要重写的例程。

请注意:令牌不必与列匹配。如果我对源数据的理解是正确的,那么集合的第一个标记就是:“aaaaaaaaa Doe,John”。这将不得不分为两个。

在示例1和2中,第二行开始MM/DD/YY City, St ZCode,而在示例3和4中,它开始MM/DD/YY LName, FNam MName。如果您必须允许非美国地址,则需要一些聪明的代码才能可靠地区分地址和名称。示例3和示例4使用MM/DD/YY City, St ZCode作为第4行的开头。除非宏可以识别这两种格式之间存在某些固定的差异,否则您会遇到严重问题。

将整个集合作为数组的一个原因是,如果这有助于识别集合的格式,它允许宏容易地向前看。

我希望下面的代码能让你开始。

Option Explicit
Sub CtrlDecode()

  Dim InxTA As Long
  Dim RowOutCrnt As Long
  Dim RowSrcCrnt As Long
  Dim RowSrcLast As Long
  Dim TokenPart() As String
  Dim SrcData As Variant
  Dim TokenArray() As String

  ' It might be possible to place output column values directly into a cell
  ' or into an array for writing to an output row.  However, it seems to me
  ' that it will be easier to place column values into specific variables
  ' as the set is processed and then move them when processing has finished.
  ' No doubt you will want to replace my variable names with something more
  ' appropriate.
  Dim Id1 As String      ' Leading nine character identifier
  Dim Name As String
  Dim Id2 As String      ' XX/XXX/XXXXX
  Dim Id3 As String      ' XXX/xx

  ' This macro outputs decoded data to worksheet "Output".
  ' Change as necessary.
  With Worksheets("Output")
    ' Delete all data created by previous run of macro.
    .Cells.EntireRow.Delete
    ' Set column widths
    .Columns("A").ColumnWidth = 11#
    .Columns("B").ColumnWidth = 25#
    .Columns("C").ColumnWidth = 12#
    .Columns("D").ColumnWidth = 7.14
  End With

  ' This macro reads source data from worksheet "Source".
  ' Change as necessary.

  With Worksheets("Source")
    ' Find last row containing data
    RowSrcLast = .Cells(Rows.Count, 1).End(xlUp).Row
    ' Load all data into an array.  I believe this will be more convenient
    ' that accessing the data cell by cell and it will be faster.
    SrcData = .Range(.Cells(1, 1), .Cells(RowSrcLast, 1)).Value
  End With

  RowSrcCrnt = 1
  RowOutCrnt = 1

  Do While RowSrcCrnt <= RowSrcLast
    ' Record start row of current set for error message
    Call GetTokenArray(SrcData, RowSrcCrnt, TokenArray)

    ' First token.
    ' Expect 9-character-id space FName comma space LName
    TokenPart = Split(TokenArray(1), " ")
    ' Expect three parts: first with length of 9, second ending
    ' in comma.
    Debug.Assert UBound(TokenPart) = 2
    Debug.Assert Len(TokenPart(0)) = 9
    Debug.Assert Right(TokenPart(1), 1) = ","
    Id1 = TokenPart(0)
    Name = TokenPart(1) & " " & TokenPart(2)

    ' Second token.
    ' Expect xx/xxx/xxxxx
    TokenPart = Split(TokenArray(2), "/")
    Debug.Assert UBound(TokenPart) = 2
    Debug.Assert Len(TokenPart(0)) = 2
    Debug.Assert Len(TokenPart(1)) = 3
    Debug.Assert Len(TokenPart(2)) = 5
    Id2 = TokenArray(2)

    ' Third token.
    ' Expect xxx/xx
    TokenPart = Split(TokenArray(3), "/")
    Debug.Assert UBound(TokenPart) = 1
    Debug.Assert Len(TokenPart(0)) = 3
    Debug.Assert Len(TokenPart(1)) = 2
    Id3 = TokenArray(3)

    ' Continue with remaining tokens.  The next few tokens appears to be
    ' the same in all formats so you could continue accessing TokenArray(4),
    ' TokenArray(5) and so on as I have.  It then looks as though the tokens
    ' present vary from format to format.  You will then have to use a
    ' variable, such as InxTA, and have code like:
    '   If TokenArray(InxTA) is optional token xyz Then
    '     VariableXyz = TokenArray(InxTA)
    '     InxTA = InxTA + 1
    '   Else
    '     VariableXyz = ""
    '   End If

    ' There are a variety of ways of transferring values to the output
    ' worksheet.  I suspect a cell by cell write is the easiest although
    ' not the fastest.
    With Worksheets("Output")
      .Cells(RowOutCrnt, "A").Value = Id1
      .Cells(RowOutCrnt, "B").Value = Name
      .Cells(RowOutCrnt, "C").Value = Id2
      .Cells(RowOutCrnt, "D").Value = Id3
      RowOutCrnt = RowOutCrnt + 1
    End With

  Loop

End Sub
Sub GetTokenArray(SrcData As Variant, RowSrcCrnt As Long, TokenArray() As String)

  ' * SrcData(1 To X, 1 To 1) contains all the data from the source worksheet
  ' * On entry, RowSrcrnt points at the first row of a set.  On exit, it points
  '   at the first row of the next set.
  ' * On exit, TokenArray will dimensioned as (1 To N) where N is the number of
  '   tokens found.  Each entry will contain one token in the sequence found.
  ' * A set is ended by a row starting "-----" and containing nothing but
  '   hyphens and pipes.
  ' * A token is separated from its neighbour by two or more spaces.

  Debug.Assert RowSrcCrnt < UBound(SrcData, 1) ' SrcData already processed

  ' The upper bound of 500 is intended to be more than could possibly be
  ' required so as not to bother with enlarginging a small array when it
  ' becomes full.
  ReDim TokenArray(1 To 500)

  Dim DataCrnt As String
  Dim InxTA As Long
  Dim PosCrnt As Long
  Dim Pos2Spaces As Long

  InxTA = 0

  Do While True
    DataCrnt = SrcData(RowSrcCrnt, 1)
    Debug.Assert DataCrnt <> ""  ' Unexpected empty row
    ' Check for row being end of set
    If Left(DataCrnt, 5) = "-----" And _
       Replace(Replace(DataCrnt, "-", ""), "|", "") = "" Then
      ' End of set
      Exit Do
    End If

    ' Extract all token from current row
    PosCrnt = 1
    Do While True

      Do While True
        ' Search for non-space
        If PosCrnt > Len(DataCrnt) Then
          ' End of cell value
          Exit Do
        End If
        If Mid(DataCrnt, PosCrnt, 1) <> " " Then
          ' Start of next token found
          Exit Do
        End If
        PosCrnt = PosCrnt + 1
      Loop  ' until end of cell or next token

      If PosCrnt > Len(DataCrnt) Then
        ' This cell finished
        RowSrcCrnt = RowSrcCrnt + 1
        Debug.Assert RowSrcCrnt < UBound(SrcData)  ' Last set not terminated
        Exit Do
      Else
        ' Have found first character of next token
        Pos2Spaces = InStr(PosCrnt, DataCrnt, "  ")
        If Pos2Spaces = 0 Then
          ' Everything up to end of cell is last token of cell
          InxTA = InxTA + 1
          ' Trim to remove trailing single space
          TokenArray(InxTA) = Trim(Mid(DataCrnt, PosCrnt))
          ' This cell finished
          RowSrcCrnt = RowSrcCrnt + 1
          Debug.Assert RowSrcCrnt <= UBound(SrcData)  ' Last set not terminated
          Exit Do
        Else
          ' Everything up to 2 spaces is next token of this cell
          InxTA = InxTA + 1
          TokenArray(InxTA) = Mid(DataCrnt, PosCrnt, Pos2Spaces - PosCrnt)
          PosCrnt = Pos2Spaces + 2
        End If
      End If
    Loop  ' until end of cell
  Loop  ' until end of set

  Debug.Assert InxTA > 0  ' Empty set

  ' Discard unused entries
  ReDim Preserve TokenArray(1 To InxTA)
  RowSrcCrnt = RowSrcCrnt + 1       ' Step over dividing row

End Sub

答案 1 :(得分:0)

尝试一下:

Sub parser()
    Dim N As Long, wf As WorksheetFunction
    Set wf = Application.WorksheetFunction
    N = Cells(Rows.Count, "A").End(xlUp).Row
    Dim i As Long, j As Long, k As Long
    For i = 1 To N
        ary = Split(wf.Trim(Cells(i, "A").Text), " ")
        k = 2
        For j = LBound(ary) To UBound(ary)
            Cells(i, k).Value = ary(j)
            k = k + 1
        Next j
    Next i
End Sub