我有一个问题要在Visual Basic中提高效率。我想做的是以下事情:
这是我的处理方法:
因此该过程运行良好,我得到了想要的结果,但是却很慢……(在我的计算机上,需要200分钟才能处理大约200个文件,该文件一次附加包含500,000行和200列。一位同事设法做到了一个类似的过程,使用R中的data.table包附加所有文件,并且他能够在5-10分钟内在同一台计算机上对相同的.csv表执行相同的附加操作) 我想知道是否有比“逐个单元”检查文件更好的选择?我可以从源文件中识别不需要的列,然后将其完全删除吗?是否具有将文件附加在一起而不是读取每个单元格然后将它们写回的功能?
在此先感谢您的帮助!
编辑:或者,是否存在另一种编程语言(Python?Power-Shell?)在处理这种文件时效率更高?
Edit2:有关为何我认为它运行缓慢的更多详细信息。
Edit3:与评论中要求的问题有关的一段代码:
'We loop through each file in the folder location
For Each file As String In files
Dim objReader As New System.IO.StreamReader(file)
Dim fileInfo As New IO.FileInfo(file)
'We only loop through the valid .rpt files
If CheckValidRPTFile(file, True) = True Then
'Count the number of files we go through
temp_count = temp_count + 1
'Count the number of lines in the file (called FileSize)
Dim objReaderLineCOunt As New System.IO.StreamReader(file)
FileSize = 0
Do While objReaderLineCOunt.Peek() <> -1
TextLine = objReaderLineCOunt.ReadLine()
FileSize = FileSize + 1
Loop
temp_line = 1
'We loop through line by line for a given file
Do While objReader.Peek() <> -1
'We split into an array using the comma delimiter
TextLine = objReader.ReadLine()
TextLineSplit = TextLine.Split(", ")
'Skip some lines
If Strings.Left(TextLine, 1) = "*" Then
'We loop through the number of field we wish to extract for the file
For temp_field = 1 To HeaderNbOfField
If temp_field = 1 Then
BodyString = "('" & Strings.Left(fileInfo.Name, Len(fileInfo.Name) - 4) & "',"
End If
'The array HeaderMapId tells us where to pick the information from the file
'This assumes that each file in the folder have same header as the 'HeaderFile'
If HeaderMapId(Is_MPF_Type, temp_field) = Is_Not_found Then
BodyString = BodyString & "98766789"
Else
BodyString = BodyString & TextLineSplit(HeaderMapId(Is_MPF_Type, temp_field) - 1)
End If
If temp_field <> HeaderNbOfField Then
BodyString = BodyString & ","
Else
BodyString = BodyString & ")"
End If
Next
'We replace double quotes with single quotes
BodyString = Replace(BodyString, """", "'")
'This Line is to add records to the .csv file
If Enable_CSV_Output = "Yes" Then
'Remove braquets and single quotes
outFile.WriteLine(Replace(Replace(Replace(BodyString, ")", ""), "(", ""), "'", ""))
End If
End If
temp_line = temp_line + 1
Loop
temp_file = temp_file + 1
End If
Next
outFile.Close()
基于RobertBaron的评论更新的代码:
'We loop through each file in the folder location
For Each file As String In files
Dim objReader As New System.IO.StreamReader(file)
Dim fileInfo As New IO.FileInfo(file)
'We only loop through the valid .rpt files
If CheckValidRPTFile(file, True) = True Then
'Count the number of files we go through
temp_count = temp_count + 1
'Count the number of lines in the file (called FileSize)
'Dim objReaderLineCOunt As New System.IO.StreamReader(file)
'FileSize = 0
'Do While objReaderLineCOunt.Peek() <> -1
'TextLine = objReaderLineCOunt.ReadLine()
'FileSize = FileSize + 1
'Loop
'temp_line = 1
''We loop through line by line for a given file
'Do While objReader.Peek() <> -1
Dim TextLines() As String = System.IO.File.ReadAllLines(file)
For Each TextLine2 In TextLines
'Update the Progress Bar
temp_full_count = temp_full_count + 1
'We split the libe into an array using the comma delimiter
'TextLine = objReader.ReadLine()
TextLineSplit = TextLine2.Split(", ")
'Skip line that are not actual prophet records (skip header and first few lines)
If Strings.Left(TextLine2, 1) = "*" Then
'We loop through the number of field we wish to extract for the file
For temp_field = 1 To HeaderNbOfField
If temp_field = 1 Then
BodyString = "('" & Strings.Left(fileInfo.Name, Len(fileInfo.Name) - 4) & "',"
End If
'The array HeaderMapId tells us where to pick the information from the file
'This assumes that each file in the folder have same header as the 'HeaderFile'
If HeaderMapId(Is_MPF_Type, temp_field) = Is_Not_found Then
BodyString = BodyString & "98766789"
Else
BodyString = BodyString & TextLineSplit(HeaderMapId(Is_MPF_Type, temp_field) - 1)
End If
If temp_field <> HeaderNbOfField Then
BodyString = BodyString & ","
Else
BodyString = BodyString & ")"
End If
Next
'We replace double quotes with single quotes
BodyString = Replace(BodyString, """", "'")
'This Line is to add records to the .csv file
If Enable_CSV_Output = "Yes" Then
'Remove braquets and single quotes
outFile.WriteLine(Replace(Replace(Replace(BodyString, ")", ""), "(", ""), "'", ""))
End If
End If
temp_line = temp_line + 1
Next
TB_Runlog.Text = "Completed: " & fileInfo.Name & Environment.NewLine & TB_Runlog.Text
temp_file = temp_file + 1
End If
Next
outFile.Close()
答案 0 :(得分:0)
加快程序速度的一种方法是减少对磁盘的访问次数。现在,您正在逐行读取每个文件两次。每个文件最有可能适合内存。因此,您可以做的是读取存储器中文件的所有行,然后处理它的行。这样会更快。
类似的东西:
'We only loop through the valid .rpt files
If CheckValidRPTFile(file, True) = True Then
''Count the number of files we go through
'temp_count = temp_count + 1
''Count the number of lines in the file (called FileSize)
'Dim objReaderLineCOunt As New System.IO.StreamReader(file)
'FileSize = 0
'Do While objReaderLineCOunt.Peek() <> -1
' TextLine = objReaderLineCOunt.ReadLine()
' FileSize = FileSize + 1
'Loop
'temp_line = 1
''We loop through line by line for a given file
'Do While objReader.Peek() <> -1
Dim TextLines() As String = System.IO.File.ReadAllLines(file)
For Each TextLine In TextLines
'We split into an array using the comma delimiter
'TextLine = objReader.ReadLine()
TextLineSplit = TextLine.Split(", ")