如何编写vba代码来删除和替换UTF8-Characters

时间:2017-08-07 10:11:47

标签: excel vba excel-vba

我有这个代码,我似乎仍然无法用简单的“占位符”替换我的数据中的非英语字符,如越南语或泰语。

Sub NonLatin()
Dim cell As Range
    For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
        s = cell.Value
            For i = 1 To Len(s)
                If Mid(s, i, 1) Like "[!A-Za-z0-9@#$%^&* * ]" Then cell.Value = "placeholder"
            Next
    Next
End Sub

感谢您的帮助

2 个答案:

答案 0 :(得分:1)

您可以替换e之外的任何字符。 G。使用以下代码的ASCII范围(前128个字符)和占位符:

Option Explicit

Sub Test()

    Dim oCell As Range

    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "[^u0000-u00F7]"
        For Each oCell In [A1:C4]
            oCell.Value = .Replace(oCell.Value, "*")
        Next
    End With

End Sub

答案 1 :(得分:0)

有关在VBA代码中使用正则表达式的详细信息,请参阅this question

然后在像这样的函数中使用正则表达式来处理字符串。在这里,我假设您要使用占位符替换每个无效的字符,而不是整个字符串。如果它是整个字符串,那么您不需要进行单独的字符检查,只需在正则表达式模式中使用+*限定符来表示多个字符,并将整个字符串一起测试。

Function LatinString(str As String) As String
    ' After including a reference to "Microsoft VBScript Regular Expressions 5.5"
    ' Set up the regular expressions object
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        ' This is the pattern of ALLOWED characters. 
        ' Note that special characters should be escaped using a slash e.g. \$ not $
        .Pattern = "[A-Za-z0-9]"
    End With

    ' Loop through characters in string. Replace disallowed characters with "?"
    Dim i As Long
    For i = 1 To Len(str)
        If Not regEx.Test(Mid(str, i, 1)) Then
            str = Left(str, i - 1) & "?" & Mid(str, i + 1)
        End If
    Next i
    ' Return output
    LatinString = str
End Function

您可以通过

在代码中使用它
Dim cell As Range
For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
    cell.Value = LatinString(cell.Value)
Next

对于将Unicode字符串转换为UTF8字符串而不使用正则表达式的字节级方法,请查看this article