在将大型ADO记录集从Access 2013传输到新的CSV文件时,我遇到了一些性能问题。
我的要求是:
1)文件必须包含列名称
2)Recordset可包含超过500,000条记录
3)文件必须以逗号分隔
4)记录包含有时包含逗号的字母数字字符串
5)程序必须创建一个新的CSV文件(不只是更新预先存在的文件)
注意:数据存储在记录集中,因为它是从MS-SQL Server查询的
我对VBA编程比较新,所以任何帮助都会受到高度赞赏。我读到从文件的开头到结尾运行的GetRows循环可以是将大型记录集导出为CSV的有效过程。致谢
这是我目前的实施,显然违反了我的一些要求:
'WRITE DATA TO TEXT FILE
Dim f As ADODB.Field
Dim myFileSystemObject As Object
Dim txtfile As Object
Set myFileSystemObject = CreateObject("Scripting.FileSystemObject")
Set txtfile = myFileSystemObject.CreateTextFile(strPath & ".txt", True)
With adRs
For Each f In .Fields
txtfile.Write (f.Name)
txtfile.Write Chr(9)
Next
txtfile.WriteLine
txtfile.Write adRs.GetString(adClipString)
.Close
End With
txtfile.Close
答案 0 :(得分:0)
你可以试试这个(虽然我不知道表现会受到影响;你的问题似乎有两个 - 我如何得到我的结果,然后我怎样才能有效地做到这一点。)
'WRITE DATA TO TEXT FILE
Dim f As ADODB.Field
Dim myFileSystemObject As Object
Dim txtfile As Object
Dim str As String
Set myFileSystemObject = CreateObject("Scripting.FileSystemObject")
Set txtfile = myFileSystemObject.CreateTextFile(strPath & ".txt", True)
With adRs
For Each f In .Fields
txtfile.Write (f.Name)
txtfile.Write Chr(9)
Next
txtfile.WriteLine
'Add in leading double quote, double quote text qualifier throughout, and changing carriage returns to double quote/carriage return/double quote.
str = Chr(34) & Replace(Replace(adRs.GetString, vbTab, Chr(34) & "," & Chr(34)), vbCr, Chr(34) & vbCr & Chr(34))
'Write to file removing last extraneous double quote.
txtfile.Write Mid(str, 1, Len(str) - 1)
.Close
End With
txtfile.Close
所有代码正在做的是添加"到字符串的开头,将所有以前的制表符更改为","并将所有回车更改为" [cr]"。写入功能中的替换应该删除最后一个无关的"。文本分隔符应该保护任何嵌入的逗号。再一次,不确定性能,但似乎你可以得到一些东西并运行到你的规格,至少。
编辑:如果这不能让你到达目的地,你也可以查看SQL Server的BCP实用程序。
答案 1 :(得分:0)
这里比赛的后期 - 这个问题超过3岁! - 并且你得到了一个答案,它实现了在引号中封装文本字段,这是一个称职答案的标记。
但是,很少有解决这三大问题的答案:
VBA中的初学者可能会发现字符串优化很难理解:本机VBA中可用的最大性能增益是避免字符串分配和连接(这就是为什么:http://www.aivosto.com/vbtips/stringopt2.html#huge) - 所以我使用join,split和替换而不是myString = MyString & MoreString
尾随循环,最后调用RecordSet.GetRows()
,会引起对结构化编程强烈意见的编码人员的注意。
所以这里是:
Public Function RecordsetToCSV(ByRef rst As ADODB.Recordset, _ ByRef OutputFile As String, _ Optional ByRef FieldList As Variant, _ Optional ByVal CoerceText As Boolean = True, _ Optional ByVal CleanupText As Boolean = True _ ) As Long
' Output a recordset to a csv file and returns the row count.
' If the output file is locked, or specified in an inaccessible location, the ' 'ByRef' OutputFile parameter becomes a file in the user's local temp folder
' You can supply your own field list. This isn't a substituted file header of ' aliased field names: it is a subset of the field names, which ADO will read ' selectively from the recordset. Each item in the list matches a named field
' CoerceText=TRUE will encapsulate all items, numeric or not, in quote marks. ' CleanupText=TRUE strips quotes and linefeeds from the data: FALSE is faster
' You should only set them FALSE if you're confident that the data is 'clean' ' with no quote marks, commas or line breaks in any unencapsulated text field
' This code handles unicode, and outputs a file that can be read by Microsoft ' ODBC and OLEDB database drivers by removing the Byte Order Marker.
On Error Resume Next
' Coding note: we're not doing any string-handling in VBA.Strings: allocating ' deallocating and (especially!) concatenating are SLOW. We are using the VBA ' Join and Split functions ONLY. Feel free to optimise further by declaring a ' faster set of string functions from the Kernel if you want to. ' ' Other optimisations: type pun. Byte Arrays are interchangeable with strings ' Some of our loops through these arrays have a 'step' of 2. This optimises a ' search-and-replace for ANSI chars in an array of 2-byte unicodes. Note that ' it's only used to remove known ANSI 'Latin' characters with a 'low' byte of ' zero: any other use of the two-byte 'step' will fail on non-Latin unicodes.
' ** THIS CODE IS IN THE PUBLIC DOMAIN ** ' Nigel Heffernan Excellerando.Blogspot.com
Const FETCH_ROWS As Long = 4096
Dim COMMA As String * 1 Dim BLANK As String * 4 Dim EOROW As String * 2
COMMA = ChrW$(44) BLANK = ChrW$(13) & ChrW$(10) & ChrW$(13) & ChrW$(10) EOROW = ChrW$(13) & ChrW$(10)
Dim FetchArray As Variant
Dim i As Long ' i for rows in the output file, records in the recordset Dim j As Long ' j for columns in the output file, fields in the recordset Dim k As Long ' k for all other loops: bytes in individual data items
Dim i_Offset As Long
Dim i_LBound As Long Dim i_UBound As Long Dim j_LBound As Long Dim j_UBound As Long Dim k_lBound As Long Dim k_uBound As Long
Dim hndFile As Long Dim varField As Variant
Dim iRowCount As Long Dim arrBytes() As Byte Dim arrTemp1() As String Dim arrTemp2() As String Dim arrTemp3(0 To 2) As String
Dim boolNumeric As Boolean
Dim strHeader As String Dim arrHeader() As Byte
Dim strFile As String Dim strPath As String Dim strExtn As String
strFile = FileName(OutputFile) strPath = FilePath(OutputFile) strExtn = FileExtension(strFile)
If rst Is Nothing Then Exit Function If rst.State <> 1 Then Exit Function
If strExtn = "" Then strExtn = ".csv" End If
With FSO
If strFile = "" Then strFile = .GetTempName strFile = Left(strFile, Len(strFile) - Len(".tmp")) strFile = strFile & strExtn End If If strPath = "" Then strPath = TempSQLFolder End If If Right(strPath, 1) <> "\" Then strPath = strPath & "\" End If strExtn = FileExtension(strFile) If strExtn = "" Then strExtn = ".csv" strFile = strFile & strExtn End If OutputFile = strPath & strFile
End With
If FileName(OutputFile) <> "" Then If Len(VBA.FileSystem.Dir(OutputFile, vbNormal)) <> 0 Then
Err.Clear VBA.FileSystem.Kill OutputFile ' do it now, and reduce wait for deletion If Err.Number = 70 Then ' permission denied: change the output file name OutputFile = FileStripExtension(OutputFile) & "_" & FileStripExtension(FSO.GetTempName) & FileExtension(OutputFile) End If End If
End If
' ChrW$() gives a 2-byte 'Wide' char. This coerces all subsequent operations to UTF16
arrTemp3(0) = ChrW$(34) ' Encapsulating quote arrTemp3(1) = vbNullString ' The field value will go here arrTemp3(2) = ChrW$(34) ' Encapsulating quote
If rst.EOF And rst.BOF Then FetchArray = Empty ElseIf rst.EOF Then rst.MoveFirst End If
' An empty recordset must still write a header row of field names: we put this in the ' output buffer and write it to the file before we start looping through the records.
ReDim FetchArray(0 To rst.Fields.Count, 0 To 0)
i_LBound = 0 i_UBound = 0
If IsMissing(FieldList) Then
For j = LBound(FetchArray, 1) To UBound(FetchArray, 1) - 1 Step 1 FetchArray(j, i_UBound) = rst.Fields(j).Name Next j
Else
j = 0 For Each varField In FieldList j_UBound = j_UBound + 1 Next varField ReDim arrTemp2(j_LBound To j_UBound) For Each varField In FieldList FetchArray(j, i_UBound) = CStr(varField) j = j + 1 Next varField
End If
ReDim arrTemp1(i_LBound To i_UBound) ' arrTemp1 is the rowset we write to file ReDim arrTemp2(j_LBound To j_UBound) ' arrTemp2 represents a single record
Do Until IsEmpty(FetchArray)
i_LBound = LBound(FetchArray, 2) i_UBound = UBound(FetchArray, 2) j_LBound = LBound(FetchArray, 1) j_UBound = UBound(FetchArray, 1) If UBound(arrTemp1) <> i_UBound + 1 Then ReDim arrTemp1(i_LBound To i_UBound + 1) arrTemp1(i_UBound + 1) = vbNullString ' The 'Join' operation will insert a trailing row End If ' delimiter here (Not required by the last chunk) If UBound(arrTemp2) <> j_UBound Then ReDim arrTemp2(j_LBound To j_UBound) End If ' Data body. This is heavily optimised to avoid VBA String functions with allocations For i = i_LBound To i_UBound Step 1 ' If this is confusing... Were you expecting FetchArray(i,j)? i for row, j for column? ' FetchArray comes from RecordSet.GetRows(), which returns a TRANSPOSED array: i and j ' are still the field and record ordinals, row(i) and column(j) in the output file. For j = j_LBound To j_UBound If IsNull(FetchArray(j, i)) Then arrTemp2(j) = "" Else arrTemp2(j) = FetchArray(j, i) ' confused? see he note above End If If CleanupText Or (i_UBound = 0) Then ' (i_UBound=0): always clean up field names arrBytes = arrTemp2(j) ' Integer arithmetic is faster than string-handling for ' this: all VBA string operations require an allocation For k = LBound(arrBytes) To UBound(arrBytes) Step 2 Select Case arrBytes(k) Case 10, 13, 9, 160 If arrBytes(k + 1) = 0 Then arrBytes(k) = 32 ' replaces CR, LF, Tab, and non-breaking End If ' spaces with the standard ANSI space Case 44 If Not CoerceText Then If arrBytes(k + 1) = 0 Then arrBytes(k) = 32 ' replace comma with the ANSI space End If End If Case 34 If arrBytes(k + 1) = 0 Then arrBytes(k) = 39 ' replaces double-quote with single quote End If End Select Next k arrTemp2(j) = arrTemp2(j) End If ' cleanup If CoerceText Then ' encapsulate all fields in quotes, numeric or not arrTemp3(1) = arrTemp2(j) arrTemp2(j) = Join$(arrTemp3, vbNullString) ElseIf (i = 0) And (i = i_UBound) Then ' always encapsulate field names arrTemp3(1) = arrTemp2(j) arrTemp2(j) = Join$(arrTemp3, vbNullString) Else ' selective encapsulation, leaving numeric fields unencapsulated: ' we *could* do this by reading the ADODB field types: but that's ' slower, and you may be 'caught out' by provider-specific types. arrBytes = arrTemp2(j) boolNumeric = True For k = LBound(arrBytes) To UBound(arrBytes) Step 2 If arrBytes(k) < 43 Or arrBytes(k) > 57 Then If arrBytes(k) <> 69 Then boolNumeric = False Exit For Else If k > UBound(arrBytes) - 5 Then boolNumeric = False Exit For ElseIf arrBytes(k + 2) = 45 Then ' detect "1.234E-05" ElseIf arrBytes(k + 2) = 43 Then ' detect "1.234E+05" Else boolNumeric = False Exit For End If End If End If Next k If boolNumeric Then For k = 1 + LBound(arrBytes) To UBound(arrBytes) Step 2 If arrBytes(k) <> 0 Then boolNumeric = False Exit For End If Next k End If arrBytes = vbNullString If Not boolNumeric Then ' text field, encapsulate it arrTemp3(1) = arrTemp2(j) arrTemp2(j) = Join(arrTemp3, vbNullString) End If End If ' CoerceText Next j arrTemp1(i) = Join(arrTemp2, COMMA) Next i iRowCount = iRowCount + i - 2 ' **** WHY WE 'PUT' A BYTE ARRAY INSTEAD OF A VBA STRING VARIABLE **** **** ' ' Put #hndFile, , StrConv(Join(arrTemp1, EOROW), vbUnicode) ' Put #hndFile, , Join(arrTemp1, EOROW) ' ' If you pass unicode, Wide or UTF-16 string variables to PUT, it prepends a ' Unicode Byte Order Mark to the data which, when written to your file, will ' render the field names illegible to Microsoft's JET ODBC and ACE-OLEDB SQL ' drivers (which can actually read unicode field names, if the helpful label ' isn't in the way). The primeval 'PUT' statement writes a Byte array as-is. ' ' **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** arrBytes = Join$(arrTemp1, vbCrLf) If hndFile = 0 Then i_Offset = 1 If Len(Dir(OutputFile)) > 0 Then VBA.FileSystem.Kill OutputFile End If WaitForFileDeletion OutputFile hndFile = FreeFile Open OutputFile For Binary Access Write As #hndFile End If Put #hndFile, i_Offset, arrBytes i_Offset = i_Offset + 1 + UBound(arrBytes) Erase arrBytes If rst.EOF Then Erase FetchArray FetchArray = Empty Else If IsMissing(FieldList) Then FetchArray = rst.GetRows(FETCH_ROWS) Else FetchArray = rst.GetRows(FETCH_ROWS, , FieldList) End If End If
Loop ' until isempty(FetchArray)
If iRowCount < 1 Then ' iRowCount = 0 ' Row Count excludes the header End If
RecordsetToCSV = iRowCount
ExitSub:
On Error Resume Next If hndFile <> 0 Then Close #hndFile End If Erase arrBytes Erase arrTemp1 Erase arrTemp2 Exit Function
ErrSub:
Resume ExitSub
End Function
Public Function FilePath(Path As String) As String ' Strip the filename from a path, leaving only the path to the folder ' The last char of this path will be the backslash
' This does not check for the existence or accessibility of the file: ' all we're doing here is string-handling
Dim strPath As String Dim arrPath() As String
Const BACKSLASH As String * 1 = "\"
strPath = Trim(Path)
If strPath = "" Then Exit Function If Right$(strPath, 1) = BACKSLASH Then Exit Function
arrPath = Split(strPath, BACKSLASH)
If UBound(arrPath) = 0 Then ' does not contain "\" FilePath = "" Else arrPath(UBound(arrPath)) = vbNullString FilePath = Join$(arrPath, BACKSLASH) End If
Erase arrPath
End Function
Public Function FileName(Path As String) As String ' Strip the folder and path from a file's path string, leaving only the file name
' This does not check for the existence or accessibility of the file: ' all we're doing here is string-handling
Dim strPath As String Dim arrPath() As String
Const BACKSLASH As String * 1 = "\"
strPath = Trim(Path)
If strPath = "" Then Exit Function If Right$(strPath, 1) = BACKSLASH Then Exit Function
arrPath = Split(strPath, BACKSLASH)
If UBound(arrPath) = 0 Then ' does not contain "\" FileName = Path Else FileName = arrPath(UBound(arrPath)) End If
Erase arrPath
End Function
Public Function FileExtension(Path As String) As String ' Return the extension of the file
' This is just string-handling: no file or path validation is attempted ' The file extension is deemed to be whatever comes after the final '.' ' The extension is returned with the dot, eg: ".txt" not "txt" ' If no extension is detected, FileExtension returns an empty string
Dim strFile As String Dim arrFile() As String Const DOT_EXT As String * 1 = "."
strFile = FileName(Path) strFile = Trim(strFile)
If strFile = "" Then Exit Function If Right$(strFile, 1) = DOT_EXT Then Exit Function
arrFile = Split(strFile, DOT_EXT)
If UBound(arrFile) = 0 Then ' does not contain "\" FileExtension = vbNullString Else FileExtension = arrFile(UBound(arrFile)) FileExtension = Trim(FileExtension) If Len(FileExtension) > 0 Then FileExtension = DOT_EXT & FileExtension End If End If
Erase arrFile
End Function
Public Function FileStripExtension(Path As String) As String ' Return the filename, with the extension removed
' This is just string-handling: no file validation is attempted ' The file extension is deemed to be whatever comes after the final '.' ' Both the dot and the extension are removed
Dim strFile As String Dim arrFile() As String Const DOT_EXT As String * 1 = "."
strFile = FileName(Path)
If strFile = "" Then Exit Function If Right$(strFile, 1) = DOT_EXT Then Exit Function
strFile = Trim(strFile)
arrFile = Split(strFile, DOT_EXT)
If UBound(arrFile) = 0 Then ' does not contain "\" FileStripExtension = vbNullString Else ReDim Preserve arrFile(LBound(arrFile) To UBound(arrFile) - 1) FileStripExtension = Join$(arrFile, DOT_EXT) End If
Erase arrFile
If strFile = "" Then strFile = .GetTempName strFile = Left(strFile, Len(strFile) - Len(".tmp")) strFile = strFile & strExtn End If If strPath = "" Then strPath = TempSQLFolder End If If Right(strPath, 1) <> "\" Then strPath = strPath & "\" End If strExtn = FileExtension(strFile) If strExtn = "" Then strExtn = ".csv" strFile = strFile & strExtn End If OutputFile = strPath & strFile
如果您还没有自己的版本,还需要三个路径和文件名实用程序功能:
字符串封装逻辑还有改进空间:正确方法是查找记录集的字段类型并相应地应用引号,结果可能比我的笨重更快字节计数方法。但是,我的方法是关于文件使用者以及他们期望看到的内容;并不总是与应该接受的内容对齐。
如果您成功编写了更快更强大的版本,请告诉我。