当文本没有适当的定界符时,如何将文本转换为列。
例如,如何更改以下行:
变成这样:
在Excel中,“文本到”列似乎找不到正确的分隔符(空格,制表符...)。我在VBA中尝试了以下操作:
I1 = Mid(Cells(i, 1), 1, 16)
I2 = Mid(Cells(i, 1), 17, 33)
I3 = Mid(Cells(i, 1), 34, 49)
I4 = Mid(Cells(i, 1), 50, 53)
I5 = Mid(Cells(i, 1), 54, 66)
I6 = Mid(Cells(i, 1), 67, 82)
I7 = Mid(Cells(i, 1), 83, 99)
I8 = Mid(Cells(i, 1), 100, 116)
I9 = Mid(Cells(i, 1), 117, 133)
但是我知道它不适用于所有列。例如,对于I3,我得到了更多预期的值,例如:
我也尝试过替换标签(如果存在的话),例如:
MyString = Replace(MyString, vbTab, "")
但也不起作用。
还有另一种方法吗?
答案 0 :(得分:1)
这是尝试使用自定义ReplaceWhitespace
函数,该函数根据其长度依次替换空白部分。作为中间步骤,空白用分号代替;删除不必要的分号作为最后一步。 Split
is used to read将解析后的字符串转换为数组,然后array is used将结果读取到工作表中。根据您的特定需求调整ReplaceWhitespace
应该很简单。
请注意,此算法不会评估将单个空格字符的实例视为噪声(如在“未指定的管道中”)还是应视为有效的单词除错器(如在“单位成本”中)。因此,在ReplaceWhitespace
:"- -" ~~> "-;-"
和" UNASSIGNED " ~~> ";UNASSIGNED;"
假设屏幕截图中的数据位于A1:A4
范围内,则此代码或多或少会产生所需的输出,如下面的屏幕截图所示。
编辑:ReplaceWhitespace
的初始设计基于反复试验。稍加思索之后,我意识到空格字符或分号为composite number的模式将由算法中查找那些字符数为质数的模式的行处理。我已经相应地更新了代码。
Sub ParseUnstructured()
Dim i As Long
For Each cell In Range("A1:A4")
i = i + 1
' Clean whitespace:
sRow = ReplaceWhitespace(cell.Value)
' Read to array
Dim sArray() As String
sArray() = Split(sRow, ";")
' Read to worksheet:
Range("A1").Offset(5 + i).Resize(1, UBound(sArray)+1).Value = sArray
Next cell
End Sub
Function ReplaceWhitespace(sInput As String) As String
Dim sOutput As String
' Look for special cases with single-whitespace noise:
sOutput = Replace(sInput, "- -", "-;-") ' Take care of "----- ----"
sOutput = Replace(sOutput, "UNASSIGNED", ";UNASSIGNED;")
' Look for patterns where the number of "noise" characters is a prime number:
sOutput = Replace(sOutput, " ", ";") ' 7 whitespaces
sOutput = Replace(sOutput, " ", ";") ' 5
sOutput = Replace(sOutput, " ", ";") ' 3
sOutput = Replace(sOutput, " ", ";") ' 2
' sOutput = Replace(sOutput, " ", "_") ' 1 Optional
sOutput = Replace(sOutput, ";;;;;", ";") ' 5 semicolons
sOutput = Replace(sOutput, ";;;", ";") ' 3
sOutput = Replace(sOutput, ";;", ";") ' 2
sOutput = Replace(sOutput, "; ", ";") ' Takes care of some leftovers.
ReplaceWhitespace = sOutput
End Function
运行ParseUnstructured()
的结果:
答案 1 :(得分:1)
假设类别只能是几个定义的单词之一,您提供的数据确实具有规则的模式。
如果Category仅将是单个单词,则还可以假设UOM仅定义了几个单词。例如
根据该模式,我们可以构造一个正则表达式,并在VBA宏中使用它来分割行。 当然,如果模式与此不同,则该方法将无效。但是您必须提供包含所有可变性的示例。
下面的宏假定类别将是ASSIGNED
或UNASSIGNED
,但是您可以在代码中的管道分隔列表中添加更多单词。
代码中还包含其他假设。
Option Explicit
Sub parseLine()
Dim WS As Worksheet, R As Range, C As Range
Dim RE As Object, MC As Object
Dim vRes As Variant, I As Long
'Set original worksheet/range
'change to suit
'Below uses column A
Set WS = Worksheets("sheet1")
With WS
Set R = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With
'Initialize regex engine
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "^(\S+)\s+(.*)\s*\b(UNASSIGNED|ASSIGNED)\b\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)"
.IgnoreCase = False
.MultiLine = True
.Global = True
End With
'Iterate through; create the Parse line and parse
Application.ScreenUpdating = False
For Each C In R
If RE.Test(C.Text) = True Then
Set MC = RE.Execute(C.Text)
ReDim vRes(1 To MC(0).SubMatches.Count)
For I = 1 To UBound(vRes)
vRes(I) = MC(0).SubMatches(I - 1)
Next I
'write the results next to the column)
With C.Offset(0, 1).Resize(columnsize:=UBound(vRes))
.Clear
.NumberFormat = "@"
.Value = vRes
.EntireColumn.AutoFit
End With
End If
Next C
Application.ScreenUpdating = True
End Sub