使用VBScript查找并替换NUL字符

时间:2017-08-29 07:33:09

标签: vbscript

我有一个文本文件,其中包含随机行中的NUL字符。我想找到第一个NUL字符并从该NUL字符中删除整行,如下所示:

输入:

1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
1 2 3 4 20170821 20170821 6 7 10 123 10 11 13
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL

输出:

1 2 3 4 20170821
1 2 3 4 20170821 20170821 6 7 10 123 10 11 13
1 2 3 4 20170821
1 2 3 4 20170821

我有以下内容将文本文件数据读取到变量并循环遍历数据并替换NUL:

sInfile = WScript.Arguments(1)

'Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing

MsgBox("File Read Completed")

'Remove Rest of the line from NULL
Do While InStr(sData, "\00.*") > 0
    sData = Replace(sData, "\00.*", "")
Loop

'Cleanup and end
Set oFS = Nothing
WScript.Quit

脚本通过后没有任何错误,但我看不到对数据的任何更改。

编辑1: 更新的代码:

Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

If (WScript.Arguments.Count > 0) Then
    sInfile = WScript.Arguments(0)
Else
    WScript.Echo "No filename specified."
    WScript.Quit
End If
If (WScript.Arguments.Count > 1) Then
    sOutfile = WScript.Arguments(1)
Else
    sOutfile = sInfile
End If

'Get the text file from cmd file
sInfile = Wscript.Arguments(1)
' Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing

' Remove Rest of the line from NULL
Set re = New RegExp
re.Pattern = Chr(0) & ".*"
re.Global  = True
sData = re.Replace(sData, "")

Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing

' Cleanup and end
Set oFS = Nothing
WScript.Quit

以下是我提供的示例输入:

enter image description here

我希望看到如下输出:

enter image description here

但我得到了以下输出:

੊ਊਊਊਊਊਊਊਊਊਊ

编辑2: 我不知道十六进制编辑。以下是HextDump的示例输入:

FF FE 4A 00 42 00 43 00 09 00 31 00 32 00 33 00 34 00 38 00 36 00 37 00 38 
00 09 00 38 00 37 00 09 00 30 00 09 00 30 00 09 00 31 00 32 00 33 00 09 00 
32 00 30 00 31 00 37 00 09 00 31 00 32 00 33 00 34 00 09 00 31 00 33 00 34 
00 32 00 30 00 09 00 32 00 30 00 31 00 37 00 30 00 38 00 30 00 39 00 09 00 
35 00 31 00 30 00 33 00 09 00 09 00 09 00 09 00 33 00 34 00 31 00 34 00 38 
00 38 00 09 00 32 00 09 00 32 00 30 00 31 00 37 00 09 00 38 00 09 00 31 00 
09 00 37 00 09 00 2D 00 32 00 36 00 34 00 30 00 09 00 2D 00 33 00 39 00 33 
00 2E 00 31 00 36 00 31 00 33 00 37 00 35 00 09 00 2D 00 33 00 33 00 32 00 
2E 00 34 00 36 00 38 00 35 00 37 00 39 00 09 00 41 00 30 00 31 00 31 00 32 
00 35 00 38 00 39 00 2F 00 33 00 34 00 31 00 34 00 38 00 38 00 2F 00 09 00 
09 00 09 00 09 00 09 00 09 00 09 00 09 00 32 00 09 00 09 00 09 00 32 00 31 
00 37 00 38 00 31 00 09 00 58 00 59 00 5A 00 09 00 58 00 59 00 5A 00 09 00 
58 00 59 00 5A 00 09 00 31 00 32 00 33 00 09 00 31 00 32 00 33 00 09 00 2D 
00 32 00 36 00 34 00 09 00 58 00 59 00 5A 00 09 00 31 00 09 00 31 00 09 00 
31 00 32 00 33 00 09 00 09 00 09 00 32 00 31 00 37 00 38 00 32 00 31 00 0D 
00 0A 00 41 00 42 00 43 00 09 00 31 00 32 00 33 00 34 00 38 00 36 00 37 00

和我得到FF FE 4A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A

的输出的HexDump

2 个答案:

答案 0 :(得分:1)

Replace函数不执行正则表达式替换,并且VBScript也不会将\0识别为字符NUL。对于前者,您需要正则表达式对象的Replace方法,对于后者,您需要Chr函数。此外,您不需要循环,因为您无论如何都将文件的内容作为单个字符串读取。

但是,您的文件显然是UTF-16 LE编码的,这意味着每个字符由2个字节表示,其中一个字符对于ANSI字符为零。如果您读取ANSI文件等文件,则替换将删除第一个字节后的所有内容。您需要将OpenTextFile方法的第4个参数设置为-1才能将文件作为UTF-16(vulgo Unicode)文件处理。

改变这个:

Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing
...
Do While InStr(sData, "\00.*") > 0
    sData = Replace(sData, "\00.*", "")
Loop
...
Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing

进入这个:

sData = oFSO.OpenTextFile(sInfile, 1, False, -1).ReadAll

Set re = New RegExp
re.Pattern = Chr(0) & "[^\r\n]*"
re.Global  = True
sData = re.Replace(sData, "")

oFSO.OpenTextFile(sOutfile, 2, True, -1).Write sData

问题就会消失。

模式[^\r\n]*(任意数量的既不回车也不换行)用于保持Windows换行符不变。它们包括两个字符回车和换行(CR-LF)。正则表达式元字符.与换行符不匹配,但它确实匹配回车符,因此在使用模式.*时将删除这些元素。

为清楚起见:上面的代码将从包含NUL字符的每一行中删除NUL字符和行的其余部分。不包含NUL字符的行不会受到影响。

如果您希望删除NUL字符后的整个文本(包括后续行),您可以这样做:

Set re = New RegExp
re.Pattern = Chr(0) & "[\s\S]*"
sData = re.Replace(sData, "")

答案 1 :(得分:1)

您正在尝试为Replace()函数指定正则表达式模式,这将无法正常工作。通常,您根本不需要使用正则表达式。

这是非正则表达式代码:

With CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(1), 1, False, 0)
    sData = ""
    If Not .AtEndOfStream Then sData = .ReadAll
    .Close
End With

a = Split(sData, vbCrLf)
For i = 0 To UBound(a)
    q = Instr(a(i), Chr(0))
    If q > 0 Then a(i) = Mid(a(i), 1, q - 1)
Next
sData = Join(a, vbCrLf)

这是正则表达式版本:

With CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(1), 1, False, 0)
    sData = ""
    If Not .AtEndOfStream Then sData = .ReadAll
    .Close
End With

With CreateObject("VBScript.RegExp")
    .Pattern = "^(.*?)\x00.*$"
    .Global  = True
    .Multiline  = True
    sData = .Replace(sData, "$1")
End With