编辑:我已经放置了正在使用的全部2个子和1个功能。 我还对随着时间推移而变慢的部分添加了评论。
在任何人提到FileHelpers之前,我都不想使用第三方组件。
因此,我有一个CSV文件,350万行,并且我正在解析CSV的每一行并将其插入SQLite(非索引表)以保持速度。我还一次从CSV缓冲了100,000行-一切都很好,但是在我的循环中(将100,000行缓冲到字符串列表中之后,下面的代码遍历并拆分了每一行,从而建立了一个插入字符串然后消失了-每秒处理大约1000行-我可以忍受,但是经过几十万行之后,它开始减速到每秒200行左右,最终,在大约60万行之后爬行到25行,然后继续前进(不知道在什么时候),超过200万行减慢到10-15行之后,我真的想保持每秒1000行(甚至在可能的情况下提高它)。
我遗漏了一些代码,一个调用fixsquote例程以对所有引号进行排序,以及一个IF THEN语句来确定是否存在引号并以不同方式将其拆分-对于此特定的350万行csv,请不要加引号,所以我只想减少我在此处发布的代码量,以使阅读更加清晰。
最初,我使用古老的数组来管理拆分,这是相同的,在X千行之后速度变慢,但是在互联网上看,似乎字符串列表会更好,所以我转换了几行代码来利用我的循环中的字符串列表,它并没有带来任何改变。我有一种感觉,尽管我似乎并未使用数组,但我对最终的速度感到困惑。要么无法正确重用,要么是我不知道的堆栈或堆问题?
在批量INSERT期间我不会放慢速度-我可以完全删除插入并执行到SQLITE,因此SQLITE并不是问题。它与列表或字符串有关。我将尝试提升.net版本的建议-我确实在4.6上拥有此功能,因此会将其提高到4.7.1 ******
这是代码,也许有人可以发现明显的东西。
Private Sub CSVImport
Dim SQLStr As New Text.StringBuilder
Dim BigSQLStr As New Text.StringBuilder
Dim Comma As String = ""
Dim FirstInsertStr As New Text.StringBuilder
Dim BufferT As Integer = 0
Try
If IO.File.Exists(FileName) = True Then
Dim C As Integer
Dim line As String
FirstInsertStr.Clear()
FirstInsertStr.Capacity = 0
FirstInsertStr.Append("INSERT INTO " & Chr(34) & DestinationTableName & Chr(34))
FirstInsertStr.Append(" (" & Chr(34) & "LON" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "LAT" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "NUMBER" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "STREET" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "UNIT" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "CITY" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "DISTRICT" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "REGION" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "POSTCODE" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "ADDRESSIO_ID" & Chr(34))
FirstInsertStr.Append(", " & Chr(34) & "HASH" & Chr(34))
FirstInsertStr.Append(") VALUES (")
BufferT = 0
BigSQLStr.Clear()
BigSQLStr.Capacity = 0
' Create new StreamReader instance with Using block.
Using reader As IO.StreamReader = New IO.StreamReader(FileName)
' Read one line from file
line = reader.ReadLine ' first line in headers so ignore
Do Until line = Nothing
' Stop
Dim BufferRead As New List(Of String)
Dim BufferLoad As Integer = 0
Do Until BufferLoad = 100000
line = reader.ReadLine
If line = Nothing Then
Exit Do
End If
BufferRead.Add(line)
BufferLoad += 1
Loop
Dim Z As Integer = 0
For Z = 0 To BufferLoad - 1
BufferT += 1
Comma = ""
' ************** I BELIEVE THE SLOWDOWN IS WITHIN HERE, FAST AT FIRST, THEN SLOWS DOWN AFTER 30k RECORDS OR SO **********
Dim objFields2 As New List(Of String)
If BufferRead(Z).Contains(Chr(34)) = True Then
BufferRead(Z) = FixsQuote(BufferRead(Z))
objFields2.AddRange(Split(BufferRead(Z), ",", Chr(34), True))
Else
objFields2.AddRange(BufferRead(Z).Split(","))
End If
With SQLStr
.Clear()
.Capacity = 0
For C = 0 To objFields2.Count - 1
If C > 11 Then
Exit For ' we only ever want the first 11 fields.
End If
If C > 0 Then
Comma = ","
End If
If IsDBNull(objFields2(C)) = False Then
If objFields2(C).Contains(Chr(34)) = True Then
If objFields2(C).Replace(Chr(34), "").Length > 0 Then
.Append(Comma & "'" & FixsQuote(objFields2(C).Replace(Chr(34), "")) & "'")
Else
.Append(Comma & "Null")
End If
Else
If objFields2(C).Length > 0 Then
.Append(Comma & "'" & FixsQuote(objFields2(C)) & "'")
Else
.Append(Comma & "Null")
End If
End If
Else
.Append(Comma & "Null")
End If
Next
BigSQLStr.Append(FirstInsertStr.ToString)
BigSQLStr.Append(.ToString)
BigSQLStr.Append(");")
' Now Insert what we have
If BufferT = 1000 Then
Using OleCMD As New SQLite.SQLiteCommand(BigSQLStr.ToString, AddressesIOSQLDB)
OleCMD.CommandTimeout = 0
OleCMD.ExecuteNonQuery()
End Using
BigSQLStr.Clear()
BigSQLStr.Capacity = 0
BufferT = 0
End If
End With
objFields2.Clear()
objFields2 = Nothing
Next
BufferRead.Clear()
' ****************** end of what i believe is the slow down ****************
Loop
If BufferT > 0 Then
Try
Using OleCMD As New SQLite.SQLiteCommand(BigSQLStr.ToString, AddressesIOSQLDB)
OleCMD.CommandTimeout = 0
OleCMD.ExecuteNonQuery()
End Using
Catch ex As Exception
Stop
End Try
BufferT = 0
End If
End Using
End If
Catch ex As Exception
Stop
End Try
BigSQLStr.Clear()
BigSQLStr.Capacity = 0
FirstInsertStr.Clear()
FirstInsertStr.Capacity = 0
SQLStr.Clear()
SQLStr.Capacity = 0
End Sub
Private Function FixsQuote(ByVal s As String) As String
Return s.Replace("'", "''")
End Function
Private Function Split(
ByVal expression As String,
ByVal delimiter As String,
ByVal qualifier As String,
ByVal ignoreCase As Boolean) As List(Of String)
Dim _Statement As String = String.Format("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))", Regex.Escape(delimiter), Regex.Escape(qualifier))
Dim _Options As RegexOptions = RegexOptions.Compiled Or RegexOptions.Multiline
If ignoreCase Then _Options = _Options Or RegexOptions.IgnoreCase
Dim _Expression As Regex = New Regex(_Statement, _Options)
Return _Expression.Split(expression).ToList
End Function
答案 0 :(得分:0)
我无法相信这一点,但是在查看了TnTinMn的建议之后(该转换使实际转换花费的时间少于一秒钟,并且删除了BigSQLStr和SQLStr文本字符串生成器(因为我现在在事务中执行每个插入,im每秒记录20,000条记录。小伙子们被吹走了。非常感谢你们让我剖析我的代码并且不认为字符串生成器是所有问题的解决方案。从现在开始,我将我的大插入物放入事务中并离开文本字符串生成器,我可以相信,它现在快得多了(而且我已经用数据库中的记录计数对其进行了验证)。