我经常处理Excel中的大量数据,以及我在"擦洗"中的一个步骤。它是找到仅包含不可打印字符的单元格并清除它们。
为了澄清,我对要清除的匹配单元格的定义是任何使用黑色字体格式化的单元格都不会在一张纸上打印任何墨水。
这包括仅包含:
的单元格一个好的答案会考虑:
我尝试了什么: 这是我的代码到目前为止,它将工作表单元格值保存到数组中,并逐个循环查找160或小于或等于32的Unicode字符数,这涵盖了大部分或全部不可打印据我所知,字符。它只发现单元格等于单个不可打印的字符(即它将错过仅包含两个空格的单元格)。在我的机器上,它每秒运行大约250,000个单元。
Public Sub EmptyAllBlankCells()
' Get the last row in the worksheet
Dim maxRow As Long
maxRow = GetMaxCell.Row
' Get the last column in the worksheet
Dim MaxCol As Byte
MaxCol = GetMaxCell.Column
' Create an array of all worksheet cell values.
Dim arrData As Variant
ReDim arrData(0 To maxRow, 0 To MaxCol)
arrData = ActiveSheet.Range(Cells(1, 1), Cells(maxRow, MaxCol))
' Empty the contents of blank and whitespace only cells.
Dim iRow As Long
Dim iCol As Long
Dim iCellText As Variant ' or BYTE
For iRow = 1 To UBound(arrData, 1) ' First array dimension is rows.
For iCol = 1 To UBound(arrData, 2) ' Second array dimension is columns.
On Error Resume Next
iCellText = AscW(arrData(iRow, iCol))
On Error GoTo 0
If iCellText <= 32 Or iCellText = 160 Then
' Cell contains only a single non-printable character.
arrData(iRow, iCol) = Empty ' Empty the cell.
End If
iCellText = Empty
Next iCol
Next iRow
' Write array back to worksheet.
Dim Destination As Range
Set Destination = Range("A1")
Destination.Resize(UBound(arrData, 1), UBound(arrData, 2)).Value = arrData
End Sub
Private Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range
'Returns the last cell containing a value, or A1 if Worksheet is empty
Const NONEMPTY As String = "*"
If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange
If WorksheetFunction.CountA(rng) = 0 Then
Set GetMaxCell = rng.Parent.Cells(1, 1)
Else
Dim lRow As Range
With rng
Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
After:=.Cells(1, 1), _
SearchDirection:=xlPrevious, _
SearchOrder:=xlByRows)
If Not lRow Is Nothing Then
Dim lCol As Range
Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
After:=.Cells(1, 1), _
SearchDirection:=xlPrevious, _
SearchOrder:=xlByColumns)
Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)
End If
End With
End If
End Function
答案 0 :(得分:0)
这是一个难以回答的问题,因为空格和不可打印的字符之间存在很大差异。空白区域基本上是一组您没有看到的“字符”(即格式,空格等),只需使用RegEx
表达式即可删除。不可打印的字符是一个更复杂的野兽,因为它们依赖于您的设备(称为Device Context
),例如屏幕或打印机,以及您的字体(某些值在一种字体中是“可见的”但不是另一个)。
如果您真的想要访问可打印字形列表(即您的特定设备和特定字体),则存在一些可以提取该信息的Windows APIs
。关键的一个是GetFontUnicodeRanges()
功能。在VBA中使用它是一项繁琐的功能,因为它填充了一种称为GLYPHSTRUCTURE的类型。此类型包含未知维度的数组,并且在VBA中,您无法将内存读取为未知大小的数据类型 - 一旦您知道大小,也无法为内存读取重新定义该类型。因此,获取数据需要一种迭代的内存复制。下面的代码向您展示了如何做,但我应该添加一个警告,我仍然是32位,所以你需要谷歌这些API将它们转换为64位(非常简单的任务)。
获取可打印字形列表
在模块的顶部,粘贴以下API:
Private Type WCRANGE
wcLow As Integer
cGlyphs As Integer
End Type
Private Type GLYPHSET
cbThis As Long
flAccel As Long
cGlyphsSupported As Long
cRanges As Long
ranges() As WCRANGE
End Type
Private Declare Function CreateDC Lib "gdi32" Alias "CreateDCA" (ByVal lpDriverName As String, ByVal lpDeviceName As String, ByVal lpOutput As String, ByVal lpInitData As Any) As Long
Private Declare Function CreateFont Lib "gdi32.dll" Alias "CreateFontA" (ByVal nHeight As Long, ByVal nWidth As Long, ByVal nEscapement As Long, ByVal nOrientation As Long, ByVal fnWeight As Long, ByVal fdwItalic As Long, ByVal fdwUnderline As Long, ByVal fdwStrikeOut As Long, ByVal fdwCharSet As Long, ByVal fdwOutputPrecision As Long, ByVal fdwClipPrecision As Long, ByVal fdwQuality As Long, ByVal fdwPitchAndFamily As Long, ByVal lpszFace As String) As Long
Private Declare Function SelectObject Lib "gdi32.dll" (ByVal hDC As Long, ByVal hObject As Long) As Long
Private Declare Function DeleteObject Lib "gdi32.dll" (ByVal hObject As Long) As Long
Private Declare Function GetFontUnicodeRanges Lib "gdi32.dll" (ByVal hDC As Long, pGLYPHSET As Any) As Long
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (pDestination As Any, pSource As Any, ByVal lByteCount As Long)
在同一模块中,粘贴以下功能:
Private Function GetPrintables(fontName As String, rejects As String) As Collection
Dim hDC As Long, hFont As Long, ret As Long, byteCount As Long
Dim b() As Byte
Dim i As Long, c As Long
Dim gs As GLYPHSET
Dim wc As WCRANGE
Dim printables As Collection
Dim rejectArray() As String
Dim v As Variant
Dim hit As Boolean
'Create a design context and font.
'Note: I've just used your screen context. Google if you want to use a printer.
hDC = CreateDC("DISPLAY", vbNullString, vbNullString, ByVal 0&)
hFont = CreateFont(0, 0, 0, 0, 400, 0, 0, 0, 1, 0, 0, 0, 0 Or 0, fontName)
ret = SelectObject(hDC, hFont)
'Find size of GLYPHSET structure.
byteCount = GetFontUnicodeRanges(hDC, ByVal 0&)
ReDim b(0 To byteCount - 1)
ret = GetFontUnicodeRanges(hDC, b(0))
'Populate first 4 items of GLYPHSET
CopyMemory gs, b(0), LenB(gs) - LenB(wc)
'Populate .ranges array of GLYPHSET
ReDim gs.ranges(0 To gs.cRanges - 1)
i = LenB(gs) - LenB(wc)
For c = 0 To gs.cRanges - 1
CopyMemory gs.ranges(c), b(i), LenB(wc)
i = i + LenB(wc)
Next
'Delete the graphics objects
DeleteObject hFont
DeleteObject hDC
'Create the list of printable glyphs
rejectArray = Split(rejects, "|")
Set GetPrintables = New Collection
For c = 0 To gs.cRanges - 1
wc = gs.ranges(c)
For i = wc.wcLow To wc.wcLow + wc.cGlyphs - 1
'Ignore printable glyphs in our reject list
hit = False
For Each v In rejectArray
If i = CLng(v) Then
hit = True
Exit For
End If
Next
If Not hit Then GetPrintables.Add True, CStr(i)
Next
Next
End Function
如果需要,您需要添加自己的错误处理,这将包括测试API的返回值是否为0(表示功能失败)。
测试您的值
我认为你会发现更容易扭转你的测试规则;也就是说,不是专门寻找不可打印的字符,而是尝试测试是否存在任何可打印的字符。如果测试结果为true,那么该值应保持不变(根据您的帖子)。因此,仍在您的模块中的代码将是:
Public Sub EmptyAllBlankCells()
Dim ws As Worksheet
Dim minCell As Range, maxCell As Range
Dim arrData As Variant
Dim printables As Collection
Dim rejects As String
Dim r As Long, c As Long, i As Long
Dim str As String, chr As String
Dim val As Long
Dim hit As Boolean
Set ws = Sheet1
Set minCell = ws.Cells(1, 1)
' Get the last row in the worksheet
Set maxCell = GetMaxCell(ws.UsedRange)
' Create an array of all worksheet cell values.
arrData = ws.Range(minCell, maxCell).Value2
'Acquire the list of printable characters.
rejects = "&H0020|&H00A0|&H1680|&H180E|&H2000|&H2001|&H2002|&H2003|&H2004|&H2005|&H2006|&H2007|&H2008|&H2009|&H200A|&H200B|&H202F|&H205F|&H3000|&HFEFF"
Set printables = GetPrintables(minCell.Font.Name, rejects)
' Empty the contents of blank and whitespace only cells.
For r = 1 To UBound(arrData, 1) ' First array dimension is rows.
For c = 1 To UBound(arrData, 2) ' Second array dimension is columns.
If Not IsEmpty(arrData(r, c)) Then
str = CStr(arrData(r, c))
hit = False
For i = 1 To Len(str)
chr = Mid(str, i, 1)
val = AscW(chr)
If val < 0 Then val = 65536 + val
On Error Resume Next
hit = printables(CStr(val))
On Error GoTo 0
If hit Then Exit For
Next
' If Not hit Then arrData(r, c) = Empty
If Not hit Then arrData(r, c) = arrData(r, c) & "N"
End If
Next
Next
' Write array back to worksheet.
ws.Range("A1").Resize(UBound(arrData, 1), UBound(arrData, 2)).Value = arrData
End Sub
您会注意到一个名为rejects
的变量。这只是一个你仍然不想要的可打印字形列表 - 在示例代码中,它是一个unicode空格列表。
<强>速度强>
您将数据读入数组是正确的,特别是如果它们与您在帖子中建议的数量相当。为了使事情快速运行,我使用了Collection
,其中包含每个可打印的字形,其键是转换为字符串的unicode值。这不是内存效率,但它会相对较快。