我有一个文本文件,其中包含随机行中的NUL字符。我想找到第一个NUL字符并从该NUL字符中删除整行,如下所示:
输入:
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
1 2 3 4 20170821 20170821 6 7 10 123 10 11 13
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
1 2 3 4 20170821NUL20170821NULNULNULNUL 123 NULNULNUL
输出:
1 2 3 4 20170821
1 2 3 4 20170821 20170821 6 7 10 123 10 11 13
1 2 3 4 20170821
1 2 3 4 20170821
我有以下内容将文本文件数据读取到变量并循环遍历数据并替换NUL:
sInfile = WScript.Arguments(1)
'Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing
MsgBox("File Read Completed")
'Remove Rest of the line from NULL
Do While InStr(sData, "\00.*") > 0
sData = Replace(sData, "\00.*", "")
Loop
'Cleanup and end
Set oFS = Nothing
WScript.Quit
脚本通过后没有任何错误,但我看不到对数据的任何更改。
编辑1: 更新的代码:
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
If (WScript.Arguments.Count > 0) Then
sInfile = WScript.Arguments(0)
Else
WScript.Echo "No filename specified."
WScript.Quit
End If
If (WScript.Arguments.Count > 1) Then
sOutfile = WScript.Arguments(1)
Else
sOutfile = sInfile
End If
'Get the text file from cmd file
sInfile = Wscript.Arguments(1)
' Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing
' Remove Rest of the line from NULL
Set re = New RegExp
re.Pattern = Chr(0) & ".*"
re.Global = True
sData = re.Replace(sData, "")
Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing
' Cleanup and end
Set oFS = Nothing
WScript.Quit
以下是我提供的示例输入:
我希望看到如下输出:
但我得到了以下输出:
ਊਊਊਊਊਊਊਊਊਊ
编辑2: 我不知道十六进制编辑。以下是HextDump的示例输入:
FF FE 4A 00 42 00 43 00 09 00 31 00 32 00 33 00 34 00 38 00 36 00 37 00 38
00 09 00 38 00 37 00 09 00 30 00 09 00 30 00 09 00 31 00 32 00 33 00 09 00
32 00 30 00 31 00 37 00 09 00 31 00 32 00 33 00 34 00 09 00 31 00 33 00 34
00 32 00 30 00 09 00 32 00 30 00 31 00 37 00 30 00 38 00 30 00 39 00 09 00
35 00 31 00 30 00 33 00 09 00 09 00 09 00 09 00 33 00 34 00 31 00 34 00 38
00 38 00 09 00 32 00 09 00 32 00 30 00 31 00 37 00 09 00 38 00 09 00 31 00
09 00 37 00 09 00 2D 00 32 00 36 00 34 00 30 00 09 00 2D 00 33 00 39 00 33
00 2E 00 31 00 36 00 31 00 33 00 37 00 35 00 09 00 2D 00 33 00 33 00 32 00
2E 00 34 00 36 00 38 00 35 00 37 00 39 00 09 00 41 00 30 00 31 00 31 00 32
00 35 00 38 00 39 00 2F 00 33 00 34 00 31 00 34 00 38 00 38 00 2F 00 09 00
09 00 09 00 09 00 09 00 09 00 09 00 09 00 32 00 09 00 09 00 09 00 32 00 31
00 37 00 38 00 31 00 09 00 58 00 59 00 5A 00 09 00 58 00 59 00 5A 00 09 00
58 00 59 00 5A 00 09 00 31 00 32 00 33 00 09 00 31 00 32 00 33 00 09 00 2D
00 32 00 36 00 34 00 09 00 58 00 59 00 5A 00 09 00 31 00 09 00 31 00 09 00
31 00 32 00 33 00 09 00 09 00 09 00 32 00 31 00 37 00 38 00 32 00 31 00 0D
00 0A 00 41 00 42 00 43 00 09 00 31 00 32 00 33 00 34 00 38 00 36 00 37 00
和我得到FF FE 4A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A
答案 0 :(得分:1)
Replace
函数不执行正则表达式替换,并且VBScript也不会将\0
识别为字符NUL。对于前者,您需要正则表达式对象的Replace
方法,对于后者,您需要Chr
函数。此外,您不需要循环,因为您无论如何都将文件的内容作为单个字符串读取。
但是,您的文件显然是UTF-16 LE编码的,这意味着每个字符由2个字节表示,其中一个字符对于ANSI字符为零。如果您读取ANSI文件等文件,则替换将删除第一个字节后的所有内容。您需要将OpenTextFile
方法的第4个参数设置为-1才能将文件作为UTF-16(vulgo Unicode)文件处理。
改变这个:
Set oFS = oFSO.OpenTextFile(sInfile)
sData = oFS.ReadAll
oFS.Close
Set oFS = Nothing
...
Do While InStr(sData, "\00.*") > 0
sData = Replace(sData, "\00.*", "")
Loop
...
Set oOutfile = oFSO.OpenTextFile(sOutfile, ForWriting, True)
oOutfile.Write(sData)
oOutfile.Close
Set oOutfile = Nothing
进入这个:
sData = oFSO.OpenTextFile(sInfile, 1, False, -1).ReadAll
Set re = New RegExp
re.Pattern = Chr(0) & "[^\r\n]*"
re.Global = True
sData = re.Replace(sData, "")
oFSO.OpenTextFile(sOutfile, 2, True, -1).Write sData
问题就会消失。
模式[^\r\n]*
(任意数量的既不回车也不换行)用于保持Windows换行符不变。它们包括两个字符回车和换行(CR-LF)。正则表达式元字符.
与换行符不匹配,但它确实匹配回车符,因此在使用模式.*
时将删除这些元素。
为清楚起见:上面的代码将从包含NUL字符的每一行中删除NUL字符和行的其余部分。不包含NUL字符的行不会受到影响。
如果您希望删除NUL字符后的整个文本(包括后续行),您可以这样做:
Set re = New RegExp
re.Pattern = Chr(0) & "[\s\S]*"
sData = re.Replace(sData, "")
答案 1 :(得分:1)
您正在尝试为Replace()
函数指定正则表达式模式,这将无法正常工作。通常,您根本不需要使用正则表达式。
这是非正则表达式代码:
With CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(1), 1, False, 0)
sData = ""
If Not .AtEndOfStream Then sData = .ReadAll
.Close
End With
a = Split(sData, vbCrLf)
For i = 0 To UBound(a)
q = Instr(a(i), Chr(0))
If q > 0 Then a(i) = Mid(a(i), 1, q - 1)
Next
sData = Join(a, vbCrLf)
这是正则表达式版本:
With CreateObject("Scripting.FileSystemObject").OpenTextFile(WScript.Arguments(1), 1, False, 0)
sData = ""
If Not .AtEndOfStream Then sData = .ReadAll
.Close
End With
With CreateObject("VBScript.RegExp")
.Pattern = "^(.*?)\x00.*$"
.Global = True
.Multiline = True
sData = .Replace(sData, "$1")
End With