使用VBA从Google翻译中提取div的内容

时间:2017-11-12 12:48:58

标签: html regex excel vba excel-vba

我有翻译语言的功能:

Public Function Translate(rng As Range, Optional translateFrom As String = "nl", Optional translateTo As String = "en")
    Dim getParam As String, Trans As String, objHTTP As Object, URL As String
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    getParam = ConvertToGet(rng.Value)
    URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
    objHTTP.Open "GET", URL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ("")
    If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then
        Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")
        Translate = CleanA(Trans)
    Else
        Translate = CVErr(xlErrValue)
    End If
End Function

中文翻译:

A1 = Hello

B1 = Translate(A1,"en","zh-cn")

结果是“Nǐhǎo”,正确的结果是“你好”

链接谷歌:https://translate.google.com/m?hl=en&sl=en&tl=zh-CN&ie=UTF-8&prev=_m&q=hello

我想要结果:

B1 = 你好

C1 = Nǐ hǎo

我想我需要修复此代码:

Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")

请帮帮我,谢谢!

2 个答案:

答案 0 :(得分:3)

该页面返回两个<div>,如:

<div dir="ltr" class="o1">Nǐ hǎo</div>
<div dir="ltr" class="t0">你好</div>

建议您不要尝试使用正则表达式解析HTML,因为解析HTML很容易遇到困难 - 您可以使用VBA中的Microsoft HTML Object Library来获得类似的结果。

要获取这两个<div>代码的内容,您可以按照以下示例使用此代码:

' o1 has Anglicised translation, t0 as tranlsation in target language
Set objDivs = objHTML.getElementsByTagName("div")
For Each objDiv In objDivs
    If objDiv.className = "o1" Then
        strTranslatedO1 = objDiv.innerText
    End If
    If objDiv.className = "t0" Then
        strTranslatedT0 = objDiv.innerText
    End If
Next objDiv

这基本上是循环遍历返回的HTML中的所有<div>标记,并检查类名o1t0,然后获取innerText属性。使用此技术,您可以获取已翻译的值并将其写回工作表,例如:

enter image description here

完整代码:

Option Explicit

Public Sub Test()

    Dim ws As Worksheet

    ' testing worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    ws.Cells.Delete

    ' test inputs
    ws.Range("A1:E1") = Array("Input", "From", "To", "T0", "O1")
    ws.Range("A2:A4") = "hello"
    ws.Range("B2:B4") = "English"
    ws.Range("C2:C4") = Application.Transpose(Array("Chinese", "Spanish", "Russian"))

    ' test
    ws.Range("D2") = Translate("hello", "en", "zh-cn", True)
    ws.Range("E2") = Translate("hello", "en", "zh-cn", False)
    ws.Range("D3") = Translate("hello", "en", "es", True)
    ws.Range("E3") = Translate("hello", "en", "es", False) 'Spanish uses latin alphabet
    ws.Range("D4") = Translate("hello", "en", "ru", True)
    ws.Range("E4") = Translate("hello", "en", "ru", False)

End Sub

Public Function Translate(strInput As String, strFromLanguageCode As String, strToLanguageCode As String, blnTargetAlphabet As Boolean) As String

    Dim strURL As String
    Dim objHTTP As Object
    Dim objHTML As Object
    Dim objDivs As Object, objDiv
    Dim strTranslatedT0 As String
    Dim strTranslatedO1 As String

    ' send query to web page
    strURL = "https://translate.google.com/m?hl=" & strFromLanguageCode & _
        "&sl=" & strFromLanguageCode & _
        "&tl=" & strToLanguageCode & _
        "&ie=UTF-8&prev=_m&q=" & strInput

    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    objHTTP.Open "GET", strURL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ""

    ' create a html document
    Set objHTML = CreateObject("htmlfile")
    With objHTML
        .Open
        .Write objHTTP.responseText
        .Close
    End With

    ' o1 has Anglicised translation, t0 as tranlsation in target language
    Set objDivs = objHTML.getElementsByTagName("div")
    For Each objDiv In objDivs
        If objDiv.className = "o1" Then
            strTranslatedO1 = objDiv.innerText
        End If
        If objDiv.className = "t0" Then
            strTranslatedT0 = objDiv.innerText
        End If
    Next objDiv

    ' choose which to return
    If blnTargetAlphabet Then
        Translate = strTranslatedT0
    Else
        Translate = strTranslatedO1
    End If

CleanUp:
    Set objHTML = Nothing
    Set objHTTP = Nothing

End Function

答案 1 :(得分:0)

@Robin Mackenzie 解决方案适用于这些修复:

类似于Youtube Automate Language Translations Using Excel VBA by Dinesh Kumar Takyar

  1. "t0" 已更改。

替换If objDiv.className = "t0" Then
 If objDiv.className = "result-container" Then
  1. 对于编码问题(例如希伯来语),请在开头添加(Office 2013 及更高版本):

strInput = WorksheetFunction.EncodeURL(strInput)

注意:不要看谷歌翻译常规网页HTML,“/m?”代表“移动”,使用 Google 翻译移动页面,该页面具有不同且更简单的 HTML 代码。