如何最好地平整制表符分隔表?

时间:2010-08-03 21:11:59

标签: vb.net string split

这是一个示例数据表(数据是假的,但格式与我来自外部系统的业务数据相同):

 R1   Pears,Apples,Bananas    10
      Oranges         5
 R2   Apricots        15
      Bananas         222
      Apples,Oranges  15

数据是字符串。列以制表符分隔,行以CRLF分隔。输出应该相同。多个值以逗号分隔。

这是所需的“扁平”输出:

 R1   Pears     10
 R1   Apples    10
 R1   Bananas   10
 R1   Oranges   5
 R2   Apricots  15
 R2   Bananas   222
 R2   Apples    15
 R2   Oranges   15

每个列都填充有空白的位置,具有多个值(逗号分隔)的列被复制并在其他列中填充。

  • 复杂假设:输入中的列数任意
  • 简化假设:只有一列以逗号分隔。

我正在通过一个相对天真的解决方案(循环和一些递归)来解决这个问题,但我很想知道这是否是LINQ或其他解决方案更合适的情况。

我目前正在使用VB.NET,但C#也很好。

到目前为止,这是我的答案......优化对我来说并不是非常重要,但清晰度始终是一件好事。

Public Shared Function FlattenPlainTextTable(ByVal InputTable As String) As String
  Const RowDelimiter As String = vbCRLF
  Const ColDelimiter As String = vbTab
  Const MultDelimiter As String = ","
  ''// First pass: determine the number of columns and which column if any contains
  ''// multiple values; build a new collection of rows pre-split into columns so the
  ''// split work can be reused for the second pass.
  Dim rows As New System.Collections.Generic.List(Of String())
  Dim maxColumnIndex As Integer
  Dim multiValueColumnIndex As Integer = -1
  Dim thisRow() As String
  Dim foundComma As Integer
  For Each row As String In Split(InputTable, RowDelimiter)
    thisRow = Split(row, ColDelimiter)
    rows.Add(ThisRow)
    maxColumnIndex = Math.Max(maxColumnIndex, thisRow.GetUpperBound(0))
    If multiValueColumnIndex < 0 Then
      ''// We haven't found a multi-value column yet. Function only supports,
      ''// at maximum, one multi-value column. Look for a comma in this cell,
      ''// and if found, make this the multi-value column.
      foundComma = row.IndexOf(MultDelimiter)
      If foundComma > 0 Then
        Dim beforeComma As String
        beforeComma = row.Substring(0, foundComma - 1)
        ''// The column index is the number of column delimiters found before
        ''// the comma. Faster than splitting into an array and looking for
        ''// the comma.
        multiValueColumnIndex = beforeComma.Length - beforeComma.Replace(ColDelimiter, "").Length
      End If
    End If
  Next
  ''// If no multi-value column was found, pretend it's the first column--simpler
  ''// logic to assume there is one.
  If multiValueColumnIndex < 0 Then multiValueColumnIndex = 0
  ''// Initialize lastRow with the maximum number of columns found in the original
  ''// lastRow is used to fill down values where blanks are found on subsequent rows.
  Dim lastRow() As String = Split(New String(","c, maxColumnIndex + 1), ",")
  Dim outputTable As New StringBuilder()
  Dim thisVal As String
  Dim MuliValueColumnValues() As String
  Dim multiValues() As String
  For Each ThisRow In Rows
    ''// Get the multi-value column's data first so we know how many times to repeat the row.
    If ThisRow.GetUpperBound(0) < multiValueColumnIndex Then
      ''// If the multi-value column is after the jagged edge of this row, create an array of
      ''// one blank value.
      MuliValueColumnValues = Split("", MultDelimiter) ''// assures GetUpperBound(0)=0
    Else
      MuliValueColumnValues = Split(ThisRow(multiValueColumnIndex), MultDelimiter)
    End If
    ''// Repeat this row for as many multi-value values were found
    For RowRepeat As Integer = 0 To MuliValueColumnValues.GetUpperBound(0)
      For columnIndex As Integer = 0 To MaxColumnIndex
        If columnIndex = multiValueColumnIndex Then
          ''// Value is one of the multiple-value values
          thisVal = MuliValueColumnValues(RowRepeat)
        ElseIf ThisRow.GetUpperBound(0) < columnIndex Then
          ''// This row's jagged edge already ended, default to blank
          thisVal = ""
        Else
          thisVal = ThisRow(columnIndex)
        End If
        If thisVal = "" Then
          ''// Fill down
          thisVal = lastRow(columnIndex)
        Else
          ''// Change the fill-down value for next time. (Fill-down only
          ''// fills down the *last* value in the multi-value column, not
          ''// the whole set.)
          lastRow(columnIndex) = thisVal
        End If
        If columnIndex > 0 Then outputTable.Append(ColDelimiter)
        outputTable.Append(thisVal)
      Next
      outputTable.Append(RowDelimiter)
    Next
  Next
  return outputTable.ToString()
End Function

1 个答案:

答案 0 :(得分:0)

以下适用于固定数量的列,您希望如何处理任意列数?你能举个例子吗?

    Dim result As New System.Text.StringBuilder
    Dim fakeData As String = _
    "R1" & vbTab & "Pears,Apples,Bananas" & vbTab & "10" & vbCrLf & _
    vbTab & "Oranges" & vbTab & "5" & vbCrLf & _
    "R2" & vbTab & "Apricots" & vbTab & "15" & vbCrLf & _
    vbTab & "Bananas" & vbTab & "222" & vbCrLf & _
    vbTab & "Apples,Oranges" & vbTab & "15"

    Dim allLines() As String = Microsoft.VisualBasic.Split(fakeData, vbCrLf)
    Dim firstColText As String = String.Empty
    For Each line As String In allLines
        Dim allCols() As String = Microsoft.VisualBasic.Split(line, vbTab)
        Dim allFruits() As String = Microsoft.VisualBasic.Split(allCols(1), ",")
        If allCols(0).Length > 0 Then firstColText = allCols(0)
        For Each fruit As String In allFruits
            result.Append(firstColText).Append(vbTab).Append(fruit).Append(vbTab).Append(allCols(2)).Append(vbCrLf)
        Next
    Next