我有以下代码来替换文本文件中的NUL字符。这段代码按照我对较小文件的要求工作,但问题是文件大小增加时需要花费更多时间。我有一个包含超过200,000行的文件,包含160MB +大小。我已经为这个文件执行了代码,我等待代码执行的时间超过了2个小时。
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
If (WScript.Arguments.Count > 0) Then
sInfile = WScript.Arguments(0)
Else
WScript.Echo "No filename specified."
WScript.Quit
End If
If (WScript.Arguments.Count > 1) Then
sOutfile = WScript.Arguments(1)
Else
sOutfile = sInfile
End If
'Get the text file from cmd file
sData = ""
FinalData = ""
sInfile = WScript.Arguments(1)
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set re = New RegExp
re.Pattern = "\x00.*"
re.Global = True
Set f = oFSO.OpenTextFile(sInfile, 1, False, -1)
Do Until f.AtEndOfStream
sData = Replace(f.ReadLine, vbCrLf, "")
FinalData = FinalData + re.Replace(sData, "") + vbCrLf
Loop
f.Close
Set oOutfile = oFSO.OpenTextFile(sOutfile, 2, True, -1)
oOutfile.Write(FinalData)
oOutfile.Close
Set oOutfile = Nothing
Set oFS = Nothing
WScript.Quit
有没有办法优化代码以在更短的时间间隔内执行。
编辑1: 更新的代码:
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
If (WScript.Arguments.Count > 0) Then
sInfile = WScript.Arguments(0)
Else
WScript.Echo "No filename specified."
WScript.Quit
End If
If (WScript.Arguments.Count > 1) Then
sOutfile = WScript.Arguments(1)
Else
sOutfile = sInfile
End If
'Get the text file from cmd file
sData = ""
FinalData = ""
sInfile = WScript.Arguments(1)
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set re = New RegExp
re.Pattern = "\x00.*"
re.Global = True
Set f = oFSO.OpenTextFile(sInfile, 1, False, -1)
Do Until f.AtEndOfStream
sData = Replace(f.ReadAll, vbCrLf, "")
FinalData = FinalData + re.Replace(sData, "") + vbCrLf
Loop
f.Close
Set oOutfile = oFSO.OpenTextFile(sOutfile, 2, True, -1)
oOutfile.Write(FinalData)
oOutfile.Close
Set oOutfile = Nothing
Set oFS = Nothing
WScript.Quit
答案 0 :(得分:1)
不要将ReadAll
用于大文件。将大文件读入内存可能会耗尽计算机上的可用RAM,因此它会因为开始交换而停止运行。
还要避免在循环中连接字符串,因为操作很慢。
改变这个:
Set f = oFSO.OpenTextFile(sInfile, 1, False, -1)
Do Until f.AtEndOfStream
sData = Replace(f.ReadLine, vbCrLf, "")
FinalData = FinalData + re.Replace(sData, "") + vbCrLf
Loop
f.Close
Set oOutfile = oFSO.OpenTextFile(sOutfile, 2, True, -1)
oOutfile.Write(FinalData)
oOutfile.Close
到此:
Set f = oFSO.OpenTextFile(sInfile, 1, False, -1)
Set oOutfile = oFSO.OpenTextFile(sOutfile, 2, True, -1)
Do Until f.AtEndOfStream
oOutFile.WriteLine re.Replace(f.ReadLine, "")
Loop
f.Close
oOutfile.Close
与字符串操作相同的代码而不是正则表达式替换:
Set f = oFSO.OpenTextFile(sInfile, 1, False, -1)
Set oOutfile = oFSO.OpenTextFile(sOutfile, 2, True, -1)
Do Until f.AtEndOfStream
line = f.ReadLine
pos = InStr(line, Chr(0))
If pos > 0 Then line = Left(line, pos-1)
oOutFile.WriteLine line
Loop
f.Close
oOutfile.Close
答案 1 :(得分:1)
我知道它不是最新的,但对某人可能有用。
我尝试了另一种方法,该方法大约需要5秒钟! :)
似乎脚本引擎(wscript)或{"Violations": [
{
"code": "INVALID"
},
{
"description": "Invalid Phone Number"
},
{
"code": "DECLINE"
},
{
"description": "compliance prohibition"
},
....
]}
一次加载160 MB(通过FileSystemObject
方法)存在问题。
因此,我尝试通过.ReadAll
逐行加载所有数据(到Dictionary
中),进行处理,然后立即将其保存到输出文件中。
附录:
-我添加了创建测试文件的选项-如果您将“ CreateData”指定为第二个参数:
.ReadLine
-您无需从返回wscript util.vbs "C:\Temp\SampleData.txt" CreateData
的字符串中删除CR + LF。它们已被跳过。
-有时最好在.ReadLine
方法之前测试.AtEndOfStream
,因为如果文件为空,则该方法将导致运行时错误。
.ReadAll