我是VBA的新手。在这里发布我的问题之前,我花了将近3天的时间上网。
我有300多个文本文件(使用OCR从PDF转换的文本),来自文本文件。我需要获得包含“alphabet”和“digits”的所有单词(例如KT315A,KT-315-a等)以及源引用(txt文件名)。
我需要的是
1.add“智能过滤器”,仅复制包含
的单词
“字母”和“数字”
将复制的数据粘贴到A列
将参考文件名添加到B列
我在下面找到了可以将文本文件中的所有数据复制到Excel电子表格中的代码。
文本文件看起来像
“从252A-552A到ddddd,,,, @,@,rrrr,22,...... kt3443,fff ,,,等”
xls中的最终结果应为
A | B
252A-552A | file1
kt3443 |文件1
Option Explicit
Const sPath = "C:\outp\" 'remember end backslash
Const delim = "," 'comma delimited text file - EDIT
'Const delim = vbTab 'for TAB delimited text files
Sub ImportMultipleTextFiles()
Dim wb As Workbook
Dim sFile As String
Dim inputRow As Long
RefreshSheet
On Error Resume Next
sFile = Dir(sPath & "*.txt")
Do Until sFile = ""
inputRow = Sheets("Temp").Range("A" & Rows.Count).End(xlUp).Row + 1
'open the text file
'format=6 denotes a text file
Set wb = Workbooks.Open(Filename:=sPath & sFile, _
Format:=6, _
Delimiter:=delim)
'copy and paste
wb.Sheets(1).Range("A1").CurrentRegion.Copy _
Destination:=ThisWorkbook.Sheets("Temp").Range("A" & inputRow)
wb.Close SaveChanges:=False
'get next text file
sFile = Dir()
Loop
Set wb = Nothing
End Sub
Sub RefreshSheet()
'delete old sheet and add a new one
On Error Resume Next
Application.DisplayAlerts = False
Sheets("Temp").Delete
Application.DisplayAlerts = True
Worksheets.Add
ActiveSheet.Name = "Temp"
On Error GoTo 0
End Sub
谢谢!
答案 0 :(得分:2)
从你的例子中确切地说出一个词是什么有点难以理解。它显然可以包含字母和数字以外的字符(例如破折号),但有些项目前面有点,因此不能将其定义为由space
分隔。
我定义了一个"字"
的字符串为此,我首先用空格替换所有逗号,然后应用适当的正则表达式。但是,这可能会接受不需要的字符串,因此您可能需要更具体地定义确切的单词。
此外,不是将整个文件读入Excel工作簿,而是使用FileSystemObject
,我们可以一次处理一行,而无需将300个文件读入Excel。正如您所做的那样,通过VBA代码中的常量设置基本文件夹。
但还有其他方法可以做到这一点。
请务必设置早期绑定的引用,如代码中所示:
Option Explicit
'Set References to:
' Microsoft Scripting Runtime
' Microsoft VBscript Regular Expressions 5.5
Sub SearchMultipleTextFiles()
Dim FSO As FileSystemObject
Dim TS As TextStream, FO As Folder, FI As File, FIs As Files
Dim RE As RegExp, MC As MatchCollection, M As Match
Dim WS As Worksheet, RW As Long
Const sPath As String = "C:\Users\Ron\Desktop"
Set FSO = New FileSystemObject
Set FO = FSO.GetFolder(sPath)
Set WS = ActiveSheet
WS.Columns.Clear
Set RE = New RegExp
With RE
.Global = True
.Pattern = "(?:\d(?=\S*[a-z])|[a-z](?=\S*\d))+\S*[a-z\d]"
.IgnoreCase = True
End With
For Each FI In FO.Files
If FI.Name Like "*.txt" Then
Set TS = FI.OpenAsTextStream(ForReading)
Do Until TS.AtEndOfStream
'Change .ReadLine to .ReadAll *might* make this run faster
' but would need to be tested.
Set MC = RE.Execute(Replace(TS.ReadLine, ",", " "))
If MC.Count > 0 Then
For Each M In MC
RW = RW + 1
WS.Cells(RW, 1) = M
WS.Cells(RW, 2) = FI.Name
Next M
End If
Loop
End If
Next FI
End Sub