逐行读取大文本文件(6 GB)的最快方法

时间:2020-07-25 16:05:06

标签: vb.net multithreading file-io

我有6 GB的“ .txt”数据文件可供读取和排序。为了足够快地执行此操作,我考虑过使用多个线程同时读取同一文件的不同块并对读取行进行排序。有什么办法吗?

可视化任务:

Dim SalaryTXTPath as string = "C:\6_GB_Salary_Data.txt"

Dim Threads As Integer = 3

Dim RichCount As Integer = 0
Dim PoorCount As Integer = 0
Dim MidCount As Integer = 0
Dim ReadTHR As System.Threading.Thread

Private Sub Button1_Click() HandlesButtton1.Click
    Dim CHUNKS() As DataChunk = Get.DataChunks(SalaryTXTPath , Threads ) '<-- Separating file into 3 chunks'
    For i = 0 to Threads 
        ReadTHR  = New System.Threading.Thread(Sub() ReadTXTChunks(CHUNKS(i))) '<--- Send each chunk to new thread'
        ReadTHR.Start()
    Next
End Sub


Private Sub ReadTXTChunks(CHUNK As DataChunk)
    Me.CheckForIllegalCrossThreadCalls = False

    For Each line As String In File.ReadAllLines(CHUNK) 'Reading lines of chunk'

SyncLock "Sorting"
     Select Case Convert.ToInt32(line)
       Case < 100
         PoorCount+=1
       Case < 1000
         MidCount+=1
       Case < 100000
         RichCount+=1
     End Select
End SyncLock 

   Next
End Sub

注意:上面的代码是假定的可视化任务。可能有一些错误的用法。

编辑:我解决了使用Parallel.ForEach的问题。 Parallel方法对我来说是新方法,因为与C#相比,没有太多VB.NET示例,因此弄清楚该语法花费了一些时间,但是由于您的评论,我发现了该方法。

并行ForEach语法“ VB.NET”

        Dim CancelToken As CancellationTokenSource = New CancellationTokenSource() 'The Token For Cancelling Task if needed

        Dim POptions As ParallelOptions = New ParallelOptions() 'Option Argument For Parallel.ForEach 
        POptions.MaxDegreeOfParallelism =  Environment.ProcessorCount 'max threads
        POptions.CancellationToken = CancelToken.Token 'Setting The Cancellation Token



 Parallel.ForEach(File.ReadAllLines("Filepath"), POptions, Sub(ReadedLine)

'YOUR CODE, 
'FOR EXAMPLE:
'Richtextbox1.Invoke(Sub()
'Richtextbox1.Text+= ReadedLine
'End Sub)



                                                                                
                                                                             End Sub)


感谢您的帮助...

0 个答案:

没有答案