我有一个制表符分隔文件,格式如下:
MY, 12 MOM, 56 {INT-SAM}BARS ABCD;{INT-SAM}CHEC ABC TH [SAMPLE CODE] BOLE IRTH. SAMPLE 678 1213y12415 ZINC, 34, ABC,78 {INT-SAM}CAST IRTH;{INT-SAM}ZXYZ DEFG TH [SAMPLE CODE] BEEB ABCD EFGH. SAMPLE 901 101 9y8 1617 M 12 M 56 {INT-SAM}BARS ABCD;{INT-SAM}CHEC IR TH;{INT-NUM}132435 [SAMPLE CODE] BOLE XYZR. SAMPLE LOTS WINTER
我需要使用VBScript正则表达式删除文本中的空格:
{INT-SAM}
和;
或\t
[SAMPLE CODE]
和句点(\.
)另外,我需要确保我只删除第3和第4列的空格。如果这些模式出现在其他任何地方,它们应保持原样。
我到了这里:
Option Explicit
Dim objShell : Set objShell = CreateObject("WScript.Shell")
Dim fso : Set fso = CreateObject("Scripting.FileSystemObject")
Dim objRegEx : Set objRegEx = CreateObject("VBScript.RegExp")
Dim OriginalWCFile : OriginalFile = "test.csv"
Dim PatchedWCFile : PatchedFile = "patch.csv"
Dim objFile, objOutFile, strSearchString, Line, strNewLine
Dim strReplaceString, ptrn, StringPattern, strWriteLine, strWriteLine2
Dim strStringToBeReplaced, strStringTest
Call ReplaceP
Sub ReplaceP
Set objFile = fso.OpenTextFile(OriginalFile)
Set objOutFile = fso.CreateTextFile(PatchedFile, True)
Do While objFile.AtEndOfStream <> True
Line = objFile.ReadLine
objOutFile.Write(Line & vbCrlf)
strWriteLine = RegexRemoveSpaces(Line, "(([\S ]+\t){2})((\{INT-SAM\}[0-9A-Z ]+[;|\t])+)(([\S ]+\t){3})")
objOutFile.Write(strWriteLine & vbCrlf)
strWriteLine = RegexRemoveSpaces(strWriteLine, "(([\S ]+\t){3})((\[SAMPLE CODE\][0-9A-Z ;]+\.))(([\S ]+\t){2})")
objOutFile.Write(strWriteLine & vbCrlf)
Loop
objFile.Close
objOutFile.Close
End Sub
Function RegexRemoveSpaces(strString, strPattern)
objRegEx.Global = True
objRegEx.Pattern = strPattern
strStringTest = strString
If objRegEx.Test(strStringTest) Then
Set StringPattern = objRegEx.Execute(strStringTest)
WScript.echo StringPattern.Count
For Each ptrn In StringPattern
WScript.Echo ptrn.value
strStringToBeReplaced = objRegEx.Replace(ptrn.value, "$2") 'Doesn't Work
WScript.Echo strStringToBeReplaced
strReplaceString = Replace(strStringToBeReplaced, " ", "")
strNewLine = objRegEx.Replace(strString, "$1" & strReplaceString & "$3" )
Next
If InStr(strNewLine,"[SAMPLECODE]") > 0 Then
strNewLine = Replace(strNewLine, "[SAMPLECODE]", "[SAMPLE CODE]")
End If
RegexRemoveSpaces = strNewLine
Else
RegexRemoveSpaces = strString
End If
End Function
当我运行上面的代码时,下面的代码似乎给了我不可靠的答案:
strStringToBeReplaced = objRegEx.Replace(ptrn.value, "$2")
对于第一行,当我使用不同的模式编号时,请参阅下面的输出:
“$ 1” - MY, 12 MOM, 56
- 这就是我所期待的。
“$ 2” - MOM, 56
- 如果上面是1美元,为什么这是$2
?
“$ 3” - {INT-SAM}BARS ABCD;{INT-SAM}CHEC ABC TH
- 这就是我希望的$2
。
我不确定我是否过度复杂了解决方案。我想我正在使用()
的嵌套错误,但我无法弄清楚发生了什么。
答案 0 :(得分:0)
您希望在此处使用replacement function。像这样:
filename = "C:\path\to\input.txt"
Set fso = CreateObject("Scripting.FileSystemObject")
txt = Split(fso.OpenTextFile(filename).ReadAll, vbNewLine)
Set re = New RegExp
re.Pattern = "\{INT-SAM\}(.*?)[;\t]|\[SAMPLE CODE\](.*?)\."
re.Global = True
Function cleanup(m, s1, s2, pos, src)
If Not IsEmpty(s1) Then
s = s1
Else
s = s2
End If
cleanup = Replace(m, s, Replace(s, " ", ""))
End Function
For Each line In txt
WScript.Echo re.Replace(line, GetRef("cleanup"))
Next