从图案的一部分中删除空格

时间:2018-04-16 08:39:35

标签: regex vbscript

我有一个制表符分隔文件,格式如下:

MY, 12    MOM, 56    {INT-SAM}BARS ABCD;{INT-SAM}CHEC ABC TH    [SAMPLE CODE] BOLE IRTH.    SAMPLE 678    1213y12415
ZINC, 34,    ABC,78    {INT-SAM}CAST IRTH;{INT-SAM}ZXYZ DEFG TH    [SAMPLE CODE] BEEB ABCD EFGH.    SAMPLE 901    101 9y8 1617
M 12    M 56    {INT-SAM}BARS ABCD;{INT-SAM}CHEC IR TH;{INT-NUM}132435    [SAMPLE CODE] BOLE XYZR.    SAMPLE LOTS    WINTER

我需要使用VBScript正则表达式删除文本中的空格:

  1. {INT-SAM};\t
  2. [SAMPLE CODE]和句点(\.
  3. 另外,我需要确保我只删除第3和第4列的空格。如果这些模式出现在其他任何地方,它们应保持原样。

    我到了这里:

    Option Explicit
    
    Dim objShell : Set objShell = CreateObject("WScript.Shell")
    Dim fso : Set fso = CreateObject("Scripting.FileSystemObject")
    Dim objRegEx : Set objRegEx = CreateObject("VBScript.RegExp")
    
    Dim OriginalWCFile : OriginalFile = "test.csv"
    Dim PatchedWCFile : PatchedFile = "patch.csv"
    
    Dim objFile, objOutFile, strSearchString, Line, strNewLine
    Dim strReplaceString, ptrn, StringPattern, strWriteLine, strWriteLine2
    Dim strStringToBeReplaced, strStringTest
    
    Call ReplaceP
    
    Sub ReplaceP
        Set objFile = fso.OpenTextFile(OriginalFile)
        Set objOutFile = fso.CreateTextFile(PatchedFile, True)
    
        Do While objFile.AtEndOfStream <> True
            Line = objFile.ReadLine
            objOutFile.Write(Line & vbCrlf)
            strWriteLine = RegexRemoveSpaces(Line, "(([\S ]+\t){2})((\{INT-SAM\}[0-9A-Z ]+[;|\t])+)(([\S ]+\t){3})")
            objOutFile.Write(strWriteLine & vbCrlf)
            strWriteLine = RegexRemoveSpaces(strWriteLine, "(([\S ]+\t){3})((\[SAMPLE CODE\][0-9A-Z ;]+\.))(([\S ]+\t){2})")
            objOutFile.Write(strWriteLine & vbCrlf)
        Loop
        objFile.Close
        objOutFile.Close
    End Sub
    
    Function RegexRemoveSpaces(strString, strPattern)
        objRegEx.Global = True
        objRegEx.Pattern = strPattern
        strStringTest = strString
    
        If objRegEx.Test(strStringTest) Then
            Set StringPattern = objRegEx.Execute(strStringTest)
            WScript.echo StringPattern.Count
            For Each ptrn In StringPattern
                WScript.Echo ptrn.value
                strStringToBeReplaced = objRegEx.Replace(ptrn.value, "$2") 'Doesn't Work
                WScript.Echo strStringToBeReplaced
                strReplaceString = Replace(strStringToBeReplaced, " ", "")
                strNewLine = objRegEx.Replace(strString, "$1" & strReplaceString & "$3" )
            Next
            If InStr(strNewLine,"[SAMPLECODE]") > 0 Then
                strNewLine = Replace(strNewLine, "[SAMPLECODE]", "[SAMPLE CODE]")
            End If
            RegexRemoveSpaces = strNewLine
        Else
            RegexRemoveSpaces = strString
        End If
    End Function
    

    当我运行上面的代码时,下面的代码似乎给了我不可靠的答案:

    strStringToBeReplaced = objRegEx.Replace(ptrn.value, "$2")
    

    对于第一行,当我使用不同的模式编号时,请参阅下面的输出:

    “$ 1” - MY, 12 MOM, 56 - 这就是我所期待的。

    “$ 2” - MOM, 56 - 如果上面是1美元,为什么这是$2

    “$ 3” - {INT-SAM}BARS ABCD;{INT-SAM}CHEC ABC TH - 这就是我希望的$2

    我不确定我是否过度复杂了解决方案。我想我正在使用()的嵌套错误,但我无法弄清楚发生了什么。

1 个答案:

答案 0 :(得分:0)

您希望在此处使用replacement function。像这样:

filename = "C:\path\to\input.txt"

Set fso = CreateObject("Scripting.FileSystemObject")
txt = Split(fso.OpenTextFile(filename).ReadAll, vbNewLine)

Set re = New RegExp
re.Pattern = "\{INT-SAM\}(.*?)[;\t]|\[SAMPLE CODE\](.*?)\."
re.Global  = True

Function cleanup(m, s1, s2, pos, src)
    If Not IsEmpty(s1) Then
        s = s1
    Else
        s = s2
    End If
    cleanup = Replace(m, s, Replace(s, " ", ""))
End Function

For Each line In txt
    WScript.Echo re.Replace(line, GetRef("cleanup"))
Next