解析基于不同字符VBA的长字符串

时间:2016-03-12 05:19:25

标签: string vba excel-vba split substring

我伤脑筋了。我需要像这样解析长字符串。

2003|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2003|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2001|Jaguar|S-Type|Base Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2001|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2002|Ford|Thunderbird 2002|Lincoln|LS 2002|Jaguar|S-Type|Base Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2000|Jaguar|S-Type|Base Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2002|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2000|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2000|Lincoln|LS 2003|Lincoln|LS 2001|Lincoln|LS 2003|Ford|Thunderbird 2004|Lincoln|LS 2004|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2004|Ford|Thunderbird 2005|Jaguar|S-Type|Sport Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2005|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2005|Lincoln|LS 2004|Jaguar|XJ8 2005|Jaguar|S-Type|Sport Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2006|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 2006|Jaguar|S-Type|VDP Edition Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 2005|Jaguar|XJ8 2004|Jaguar|S-Type|Base Sedan 4-Door|3.0L 183Cu. In. V6 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2006|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC V8 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 2005|Ford|Thunderbird 2006|Lincoln|LS 2000|Jaguar|S-Type|Sport Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2002|Jaguar|S-Type|Sport Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2001|Jaguar|S-Type|Sport Sedan 4-Door|4.0L 3996CC 244Cu. In. V8 GAS DOHC Naturally Aspirated::To VIN # N52047 2002|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2005|Jaguar|S-Type|Sport Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2005|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2004|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2003|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2006|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 2004|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2005|Jaguar|S-Type|Sport Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2005|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base / Sport To VIN # N52047 2001|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::To VIN # N52047 2003|Jaguar|S-Type|Base Sedan 4-Door|3.0L 2967CC 181Cu. In. V6 GAS DOHC Naturally Aspirated::Base To VIN # N52047 2006|Jaguar|S-Type|Base Sedan 4-Door|4.2L 4196CC 256Cu. In. V8 GAS DOHC Naturally Aspirated::Base / VDP Edition To VIN # N52047 

Better structure

我知道我的决赛桌有6列 3 - (年,品牌,型号)是必需的 3 - (修剪,引擎,音符)是可选的

值引擎与Notes合并并具有字符“::” 其他人有“|”

字符

Final table

这是我的代码的一部分 - 它的工作错误。任何建议和改进都受到欢迎和赞赏:)

Dim Ret
Dim Ret2
Dim strColumnA As String

strColumnA = wsTestComp.Range("A1")
Ret = Split(strColumnA, "|")
j = 1
k = 1
For i = LBound(Ret) To UBound(Ret)

    Debug.Print Ret(i)
    If IsNumeric(Ret(i)) Then
        wsTestComp.Range("A2").Offset(k, j).value = Ret(i)
        j = j + 1
    Else
        If IsNumeric(Right(Ret(i), 4)) Then
        Ret2 = Split(Ret(i), "::")
        For h = LBound(Ret2) To UBound(Ret2)
            If IsNumeric(Right(Ret(i), 4)) Then
            wsTestComp.Range("A2").Offset(k, j).value = Left(Ret2(h), Len(Ret2(h)) - 5)
            Else
            wsTestComp.Range("A2").Offset(k, j).value = Ret2(h)
            j = j + 1
            End If
        Next h

        k = k + 1
        Else
        wsTestComp.Range("A2").Offset(k, j).value = Ret(i)
        j = j + 1
        End If
        End If

Next i

2 个答案:

答案 0 :(得分:1)

使用VBScript.RegExp查找车辆的年份,并将现有模式替换为可以与其他混乱区别开来的模式,以使用Split function。可以使用简单的Replace function来处理双冒号。

Sub makeCars()
    Dim tmp As String, y As Long, bUSE_REGEX As Boolean
    Dim pattern As String, replacement As String
    Dim rgx As Object, cmat As Object
    Dim v1 As Variant, v2 As Variant

    bUSE_REGEX = True

    With Worksheets("Sheet1")
        tmp = .Range("A1").Value2
        tmp = Replace(tmp, Chr(58) & Chr(58), Chr(124))
        tmp = Replace(tmp, Chr(124), Chr(167))
    End With

    If bUSE_REGEX Then
        'REGEX method
        Set rgx = CreateObject("VBScript.RegExp")
        With rgx
            .Global = True
            .pattern = "\s[0-9]{4}\§"
            Set cmat = .Execute(tmp)
            For y = 0 To cmat.Count - 1
                replacement = Replace(cmat(y), Chr(32), Chr(182))
                tmp = Replace(tmp, cmat(y), replacement)
            Next y
        End With
    Else
        'non-REGEX method
        For y = 1950 To 2025
            tmp = Replace(tmp, Chr(32) & y & Chr(167), Chr(182) & y & Chr(167))
        Next y
    End If

    With Worksheets("Sheet1")
        v1 = Split(tmp, Chr(182))
        For y = LBound(v1) To UBound(v1)
            v2 = Split(v1(y), Chr(167))
            .Cells(y + 2, 1).Resize(1, UBound(v2) + 1) = v2
        Next y
    End With

End Sub

我提供了一个替代RegEx解决方案的方法,只需骑行75年的汽车即可。虽然有点“蛮力”,但它完成了工作,甚至很难用毫秒来衡量两种方法之间的差异。在这种情况下这是可行的,因为可能的年限是合理的限制; RegEx应该处理更广泛的可能性。

regex_car_models

答案 1 :(得分:1)

关键是识别年份

这里是"裸露的"代码

Option Explicit

Sub parsestring()

Dim Ret As Variant
Dim i As Long
Dim rng As Range

Set rng = ThisWorkbook.Worksheets("parse").Cells(1, 1) '<== cell with the string to parse

Ret = Split(Replace(Replace(rng.Value, "|", " |"), "::", " |"), " ")
For i = LBound(Ret) To UBound(Ret)
    If Ret(i) Like "####" Then Ret(i) = "§§" & Ret(i)
Next i
Ret = Split(Join(Ret), "§§")

With rng.Offset(2, 2) '<== the "database" will be placed two rows and columns away from the cell with the string to parse
    .Resize(UBound(Ret) + 1) = WorksheetFunction.Transpose(Ret)
    .Resize(UBound(Ret) + 1).TextToColumns Destination:=.Cells(1, 1), DataType:=xlDelimited, Other:=True, OtherChar:="|"
    .CurrentRegion.EntireColumn.AutoFit
End With

End Sub

这里有一些小格式和数据排序

Sub parsestring2()

Dim Ret As Variant
Dim i As Long
Dim rng As Range

Set rng = ThisWorkbook.Worksheets("parse").Cells(1, 1) '<== cell with the string to parse


Ret = Split(Replace(Replace(rng.Value, "|", " |"), "::", " |"), " ")
For i = LBound(Ret) To UBound(Ret)
    If Ret(i) Like "####" Then Ret(i) = "§§" & Ret(i)
Next i
Ret = Split(Join(Ret), "§§")

With rng.Offset(2, 2) '<== the "database" will be placed two rows and columns away from the cell with the string to parse
    .Resize(UBound(Ret) + 1) = WorksheetFunction.Transpose(Ret)
    .Resize(UBound(Ret) + 1).TextToColumns Destination:=.Cells(1, 1), DataType:=xlDelimited, Other:=True, OtherChar:="|"
    With .Resize(1, 6)
        .Value = Array("Year", "Make", "Model", "Trim", "Engine", "Notes")
        .Interior.ColorIndex = 16
        .Font.ColorIndex = 2
    End With
    .CurrentRegion.Sort key1:="Year", order1:=xlDescending, key2:="Make", order2:=xlAscending, key3:="Model", order3:=xlAscending, header:=xlYes
    .CurrentRegion.EntireColumn.AutoFit
End With

End Sub