在将文件加载到我的工作场所使用的处理系统之前,我需要从文件中清除某些类型的特殊字符。我正在使用我创建的以下脚本,但它使用循环并且效率非常低。大约100,000行的文件将花费5-10分钟来运行脚本。
有人能建议更有效的解决方案吗?我通常对正则表达式感觉更舒服,但是如果有更好的方法可以确保换行符,换行符以及我以后可能定义的其他字符仍然可以删除,则没有必要。但这可能对需求有些过分。但是,如果我不必使用循环遍历每一行/列,它可能会更快地工作。
Function ApplyPattern(WordString, Pattern, ReplaceWith)
Dim RegEx As New VBScript_RegExp_55.RegExp
RegEx.Pattern = Pattern
RegEx.IgnoreCase = True
RegEx.Global = True
RegEx.MultiLine = True
ApplyPattern = RegEx.Replace(WordString, ReplaceWith)
End Function
Sub CleanFile()
'
'This process cleas the data within a worksheet using the regex pattern defined below
'
Dim NumRows As Long 'Declare NumRows variable
Dim NumCols As Integer 'Declare NumCols variable
Dim Pattern As String
Pattern = "[^a-zA-Z_0-9\.\t)(,# ]"
Cells(1, 1).Select 'Selects first row of worksheet
'Find the number of rows
NumRows = Worksheets(1).Cells.Find(What:="*", After:=Worksheets(1).Cells(1, 1), LookIn:=xlFormulas, LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlPrevious, MatchCase:=False).Row
'Find the number of columns
NumCols = Worksheets(1).Cells.Find(What:="*", After:=Worksheets(1).Cells(1, 1), LookIn:=xlFormulas, LookAt:=xlPart, SearchOrder:=xlByColumns, SearchDirection:=xlPrevious, MatchCase:=False).Column
'Loop for cleaning data
For TargRow = 1 To NumRows
Cells(TargRow, NumCols + 1).Value = TargRow 'Adds a row number for later use
For TargCol = 1 To NumCols
Cells(TargRow, TargCol).Value = ApplyPattern(Cells(TargRow, TargCol).Value, Pattern, "") 'Applies the regex pattern to the cell
Next TargCol
Next TargRow
End Sub
谢谢!