这是一个示例数据表(数据是假的,但格式与我来自外部系统的业务数据相同):
R1 Pears,Apples,Bananas 10
Oranges 5
R2 Apricots 15
Bananas 222
Apples,Oranges 15
数据是字符串。列以制表符分隔,行以CRLF分隔。输出应该相同。多个值以逗号分隔。
这是所需的“扁平”输出:
R1 Pears 10
R1 Apples 10
R1 Bananas 10
R1 Oranges 5
R2 Apricots 15
R2 Bananas 222
R2 Apples 15
R2 Oranges 15
每个列都填充有空白的位置,具有多个值(逗号分隔)的列被复制并在其他列中填充。
我正在通过一个相对天真的解决方案(循环和一些递归)来解决这个问题,但我很想知道这是否是LINQ或其他解决方案更合适的情况。
我目前正在使用VB.NET,但C#也很好。
到目前为止,这是我的答案......优化对我来说并不是非常重要,但清晰度始终是一件好事。
Public Shared Function FlattenPlainTextTable(ByVal InputTable As String) As String
Const RowDelimiter As String = vbCRLF
Const ColDelimiter As String = vbTab
Const MultDelimiter As String = ","
''// First pass: determine the number of columns and which column if any contains
''// multiple values; build a new collection of rows pre-split into columns so the
''// split work can be reused for the second pass.
Dim rows As New System.Collections.Generic.List(Of String())
Dim maxColumnIndex As Integer
Dim multiValueColumnIndex As Integer = -1
Dim thisRow() As String
Dim foundComma As Integer
For Each row As String In Split(InputTable, RowDelimiter)
thisRow = Split(row, ColDelimiter)
rows.Add(ThisRow)
maxColumnIndex = Math.Max(maxColumnIndex, thisRow.GetUpperBound(0))
If multiValueColumnIndex < 0 Then
''// We haven't found a multi-value column yet. Function only supports,
''// at maximum, one multi-value column. Look for a comma in this cell,
''// and if found, make this the multi-value column.
foundComma = row.IndexOf(MultDelimiter)
If foundComma > 0 Then
Dim beforeComma As String
beforeComma = row.Substring(0, foundComma - 1)
''// The column index is the number of column delimiters found before
''// the comma. Faster than splitting into an array and looking for
''// the comma.
multiValueColumnIndex = beforeComma.Length - beforeComma.Replace(ColDelimiter, "").Length
End If
End If
Next
''// If no multi-value column was found, pretend it's the first column--simpler
''// logic to assume there is one.
If multiValueColumnIndex < 0 Then multiValueColumnIndex = 0
''// Initialize lastRow with the maximum number of columns found in the original
''// lastRow is used to fill down values where blanks are found on subsequent rows.
Dim lastRow() As String = Split(New String(","c, maxColumnIndex + 1), ",")
Dim outputTable As New StringBuilder()
Dim thisVal As String
Dim MuliValueColumnValues() As String
Dim multiValues() As String
For Each ThisRow In Rows
''// Get the multi-value column's data first so we know how many times to repeat the row.
If ThisRow.GetUpperBound(0) < multiValueColumnIndex Then
''// If the multi-value column is after the jagged edge of this row, create an array of
''// one blank value.
MuliValueColumnValues = Split("", MultDelimiter) ''// assures GetUpperBound(0)=0
Else
MuliValueColumnValues = Split(ThisRow(multiValueColumnIndex), MultDelimiter)
End If
''// Repeat this row for as many multi-value values were found
For RowRepeat As Integer = 0 To MuliValueColumnValues.GetUpperBound(0)
For columnIndex As Integer = 0 To MaxColumnIndex
If columnIndex = multiValueColumnIndex Then
''// Value is one of the multiple-value values
thisVal = MuliValueColumnValues(RowRepeat)
ElseIf ThisRow.GetUpperBound(0) < columnIndex Then
''// This row's jagged edge already ended, default to blank
thisVal = ""
Else
thisVal = ThisRow(columnIndex)
End If
If thisVal = "" Then
''// Fill down
thisVal = lastRow(columnIndex)
Else
''// Change the fill-down value for next time. (Fill-down only
''// fills down the *last* value in the multi-value column, not
''// the whole set.)
lastRow(columnIndex) = thisVal
End If
If columnIndex > 0 Then outputTable.Append(ColDelimiter)
outputTable.Append(thisVal)
Next
outputTable.Append(RowDelimiter)
Next
Next
return outputTable.ToString()
End Function
答案 0 :(得分:0)
以下适用于固定数量的列,您希望如何处理任意列数?你能举个例子吗?
Dim result As New System.Text.StringBuilder
Dim fakeData As String = _
"R1" & vbTab & "Pears,Apples,Bananas" & vbTab & "10" & vbCrLf & _
vbTab & "Oranges" & vbTab & "5" & vbCrLf & _
"R2" & vbTab & "Apricots" & vbTab & "15" & vbCrLf & _
vbTab & "Bananas" & vbTab & "222" & vbCrLf & _
vbTab & "Apples,Oranges" & vbTab & "15"
Dim allLines() As String = Microsoft.VisualBasic.Split(fakeData, vbCrLf)
Dim firstColText As String = String.Empty
For Each line As String In allLines
Dim allCols() As String = Microsoft.VisualBasic.Split(line, vbTab)
Dim allFruits() As String = Microsoft.VisualBasic.Split(allCols(1), ",")
If allCols(0).Length > 0 Then firstColText = allCols(0)
For Each fruit As String In allFruits
result.Append(firstColText).Append(vbTab).Append(fruit).Append(vbTab).Append(allCols(2)).Append(vbCrLf)
Next
Next