如何使用VBA快速清除仅包含不可打印字符的所有单元格?

时间:2017-06-19 20:33:49

标签: excel-vba removing-whitespace vba excel

我经常处理Excel中的大量数据,以及我在"擦洗"中的一个步骤。它是找到仅包含不可打印字符的单元格并清除它们。

为了澄清,我对要清除的匹配单元格的定义是任何使用黑色字体格式化的单元格都不会在一张纸上打印任何墨水。

这包括仅包含:

的单元格
  • vbNullString,与此公式相同的值:=""
  • 任何类型的空间(我发现有很多不同的空格字符。)
  • 标签
  • 任何类型的换行
  • 任何其他非打印字符

一个好的答案会考虑:

  • 工作表可能包含数十万行Excel数据和数百万个单元格。
  • 匹配要清除的单元格可能包含多个非打印字符。
  • 工作表可能包含大范围的空范围。
  • 速度是一个重要因素。
  • 如果非可打印字符位于也包含可打印字符的单元格内,则必须保留这些字符。

我尝试了什么: 这是我的代码到目前为止,它将工作表单元格值保存到数组中,并逐个循环查找160或小于或等于32的Unicode字符数,这涵盖了大部分或全部不可打印据我所知,字符。它只发现单元格等于单个不可打印的字符(即它将错过仅包含两个空格的单元格)。在我的机器上,它每秒运行大约250,000个单元。

Public Sub EmptyAllBlankCells()

    ' Get the last row in the worksheet
    Dim maxRow As Long
    maxRow = GetMaxCell.Row

    ' Get the last column in the worksheet
    Dim MaxCol As Byte
    MaxCol = GetMaxCell.Column

    ' Create an array of all worksheet cell values.
    Dim arrData As Variant
    ReDim arrData(0 To maxRow, 0 To MaxCol)
    arrData = ActiveSheet.Range(Cells(1, 1), Cells(maxRow, MaxCol))

    ' Empty the contents of blank and whitespace only cells.
    Dim iRow As Long
    Dim iCol As Long
    Dim iCellText As Variant  ' or BYTE
    For iRow = 1 To UBound(arrData, 1)                  ' First array dimension is rows.
        For iCol = 1 To UBound(arrData, 2)              ' Second array dimension is columns.
            On Error Resume Next
            iCellText = AscW(arrData(iRow, iCol))
            On Error GoTo 0
            If iCellText <= 32 Or iCellText = 160 Then
                ' Cell contains only a single non-printable character.
                arrData(iRow, iCol) = Empty ' Empty the cell.
            End If
            iCellText = Empty
        Next iCol
    Next iRow


    ' Write array back to worksheet.
    Dim Destination As Range
    Set Destination = Range("A1")
    Destination.Resize(UBound(arrData, 1), UBound(arrData, 2)).Value = arrData

End Sub

Private Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range
    'Returns the last cell containing a value, or A1 if Worksheet is empty

    Const NONEMPTY As String = "*"

    If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange
    If WorksheetFunction.CountA(rng) = 0 Then
        Set GetMaxCell = rng.Parent.Cells(1, 1)
    Else
        Dim lRow As Range

        With rng
            Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
                                        After:=.Cells(1, 1), _
                                        SearchDirection:=xlPrevious, _
                                        SearchOrder:=xlByRows)
            If Not lRow Is Nothing Then

                Dim lCol As Range
                Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
                                            After:=.Cells(1, 1), _
                                            SearchDirection:=xlPrevious, _
                                            SearchOrder:=xlByColumns)

                Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)

            End If
        End With

    End If
End Function

1 个答案:

答案 0 :(得分:0)

这是一个难以回答的问题,因为空格不可打印的字符之间存在很大差异。空白区域基本上是一组您没有看到的“字符”(即格式,空格等),只需使用RegEx表达式即可删除。不可打印的字符是一个更复杂的野兽,因为它们依赖于您的设备(称为Device Context),例如屏幕或打印机,以及您的字体(某些值在一种字体中是“可见的”但不是另一个)。

如果您真的想要访问可打印字形列表(即您的特定设备和特定字体),则存在一些可以提取该信息的Windows APIs。关键的一个是GetFontUnicodeRanges()功能。在VBA中使用它是一项繁琐的功能,因为它填充了一种称为GLYPHSTRUCTURE的类型。此类型包含未知维度的数组,并且在VBA中,您无法将内存读取为未知大小的数据类型 - 一旦您知道大小,也无法为内存读取重新定义该类型。因此,获取数据需要一种迭代的内存复制。下面的代码向您展示了如何做,但我应该添加一个警告,我仍然是32位,所以你需要谷歌这些API将它们转换为64位(非常简单的任务)。

获取可打印字形列表

在模块的顶部,粘贴以下API:

Private Type WCRANGE
    wcLow As Integer
    cGlyphs As Integer
End Type
Private Type GLYPHSET
    cbThis As Long
    flAccel As Long
    cGlyphsSupported As Long
    cRanges As Long
    ranges() As WCRANGE
End Type
Private Declare Function CreateDC Lib "gdi32" Alias "CreateDCA" (ByVal lpDriverName As String, ByVal lpDeviceName As String, ByVal lpOutput As String, ByVal lpInitData As Any) As Long
Private Declare Function CreateFont Lib "gdi32.dll" Alias "CreateFontA" (ByVal nHeight As Long, ByVal nWidth As Long, ByVal nEscapement As Long, ByVal nOrientation As Long, ByVal fnWeight As Long, ByVal fdwItalic As Long, ByVal fdwUnderline As Long, ByVal fdwStrikeOut As Long, ByVal fdwCharSet As Long, ByVal fdwOutputPrecision As Long, ByVal fdwClipPrecision As Long, ByVal fdwQuality As Long, ByVal fdwPitchAndFamily As Long, ByVal lpszFace As String) As Long
Private Declare Function SelectObject Lib "gdi32.dll" (ByVal hDC As Long, ByVal hObject As Long) As Long
Private Declare Function DeleteObject Lib "gdi32.dll" (ByVal hObject As Long) As Long
Private Declare Function GetFontUnicodeRanges Lib "gdi32.dll" (ByVal hDC As Long, pGLYPHSET As Any) As Long
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (pDestination As Any, pSource As Any, ByVal lByteCount As Long)

在同一模块中,粘贴以下功能:

Private Function GetPrintables(fontName As String, rejects As String) As Collection
    Dim hDC As Long, hFont As Long, ret As Long, byteCount As Long
    Dim b() As Byte
    Dim i As Long, c As Long
    Dim gs As GLYPHSET
    Dim wc As WCRANGE
    Dim printables As Collection
    Dim rejectArray() As String
    Dim v As Variant
    Dim hit As Boolean


    'Create a design context and font.
    'Note: I've just used your screen context. Google if you want to use a printer.
    hDC = CreateDC("DISPLAY", vbNullString, vbNullString, ByVal 0&)
    hFont = CreateFont(0, 0, 0, 0, 400, 0, 0, 0, 1, 0, 0, 0, 0 Or 0, fontName)
    ret = SelectObject(hDC, hFont)

    'Find size of GLYPHSET structure.
    byteCount = GetFontUnicodeRanges(hDC, ByVal 0&)
    ReDim b(0 To byteCount - 1)
    ret = GetFontUnicodeRanges(hDC, b(0))
    'Populate first 4 items of GLYPHSET
    CopyMemory gs, b(0), LenB(gs) - LenB(wc)
    'Populate .ranges array of GLYPHSET
    ReDim gs.ranges(0 To gs.cRanges - 1)
    i = LenB(gs) - LenB(wc)
    For c = 0 To gs.cRanges - 1
        CopyMemory gs.ranges(c), b(i), LenB(wc)
        i = i + LenB(wc)
    Next

    'Delete the graphics objects
    DeleteObject hFont
    DeleteObject hDC

    'Create the list of printable glyphs
    rejectArray = Split(rejects, "|")
    Set GetPrintables = New Collection
    For c = 0 To gs.cRanges - 1
        wc = gs.ranges(c)
        For i = wc.wcLow To wc.wcLow + wc.cGlyphs - 1
            'Ignore printable glyphs in our reject list
            hit = False
            For Each v In rejectArray
                If i = CLng(v) Then
                    hit = True
                    Exit For
                End If
            Next
            If Not hit Then GetPrintables.Add True, CStr(i)
        Next
    Next
End Function

如果需要,您需要添加自己的错误处理,这将包括测试API的返回值是否为0(表示功能失败)。

测试您的值

我认为你会发现更容易扭转你的测试规则;也就是说,不是专门寻找不可打印的字符,而是尝试测试是否存在任何可打印的字符。如果测试结果为true,那么该值应保持不变(根据您的帖子)。因此,仍在您的模块中的代码将是:

Public Sub EmptyAllBlankCells()
    Dim ws As Worksheet
    Dim minCell As Range, maxCell As Range
    Dim arrData As Variant
    Dim printables As Collection
    Dim rejects As String
    Dim r As Long, c As Long, i As Long
    Dim str As String, chr As String
    Dim val As Long
    Dim hit As Boolean

    Set ws = Sheet1
    Set minCell = ws.Cells(1, 1)
    ' Get the last row in the worksheet
    Set maxCell = GetMaxCell(ws.UsedRange)

    ' Create an array of all worksheet cell values.
    arrData = ws.Range(minCell, maxCell).Value2

    'Acquire the list of printable characters.
    rejects = "&H0020|&H00A0|&H1680|&H180E|&H2000|&H2001|&H2002|&H2003|&H2004|&H2005|&H2006|&H2007|&H2008|&H2009|&H200A|&H200B|&H202F|&H205F|&H3000|&HFEFF"
    Set printables = GetPrintables(minCell.Font.Name, rejects)

    ' Empty the contents of blank and whitespace only cells.
    For r = 1 To UBound(arrData, 1)                  ' First array dimension is rows.
        For c = 1 To UBound(arrData, 2)              ' Second array dimension is columns.
            If Not IsEmpty(arrData(r, c)) Then
                str = CStr(arrData(r, c))
                hit = False
                For i = 1 To Len(str)
                    chr = Mid(str, i, 1)
                    val = AscW(chr)
                    If val < 0 Then val = 65536 + val
                    On Error Resume Next
                    hit = printables(CStr(val))
                    On Error GoTo 0
                    If hit Then Exit For
                Next
'                If Not hit Then arrData(r, c) = Empty
                If Not hit Then arrData(r, c) = arrData(r, c) & "N"
            End If
        Next
    Next

    ' Write array back to worksheet.
    ws.Range("A1").Resize(UBound(arrData, 1), UBound(arrData, 2)).Value = arrData

End Sub

您会注意到一个名为rejects的变量。这只是一个你仍然不想要的可打印字形列表 - 在示例代码中,它是一个unicode空格列表。

<强>速度

您将数据读入数组是正确的,特别是如果它们与您在帖子中建议的数量相当。为了使事情快速运行,我使用了Collection,其中包含每个可打印的字形,其键是转换为字符串的unicode值。这不是内存效率,但它会相对较快。