VBA-Google新闻搜索结果数

时间:2018-11-01 17:14:13

标签: html excel vba excel-vba xmlhttprequest

我有一个单元格,其中包含我想在Google新闻中搜索的内容。我希望代码返回该搜索的结果数。目前,我拥有该代码,该代码是我在网站上的其他地方找到的,并且不使用Google新闻,但即使如此,有时我还是会得到

  

运行时错误-2147024891(80070005)

经过70次左右的搜索,我无法再跑步。

Sub HawkishSearch()

Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim start_time As Date
Dim end_time As Date

lastRow = Range("B" & Rows.Count).End(xlUp).Row

Dim cookie As String
Dim result_cookie As String

start_time = Time
Debug.Print "start_time:" & start_time

For i = 2 To lastRow

    url = "https://www.google.co.in/search?q=" & Cells(i, 2) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

    Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send

    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText

If html.getElementById("resultStats") Is Nothing Then
    str_text = "0 Results"
Else
    str_text = html.getElementById("resultStats").innerText
End If
    Cells(i, 3) = str_text
    DoEvents
Next

end_time = Time
Debug.Print "end_time:" & end_time

Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub

1 个答案:

答案 0 :(得分:2)

最佳选择(IMO)是使用Google News API并注册一个API密钥。然后,您可以使用包含搜索词的queryString并解析JSON响应以获取结果计数。我将在下面进行操作,并使用文章标题和链接填充一个集合。我使用了一个名为JSONConverter.bas的JSON解析器,您可以将其下载并添加到项目中。然后,您可以转到VBE>工具>引用>添加对Microsoft脚本运行时的引用。


来自API的示例JSON响应:

enter image description here

buffer表示您可以通过键访问的字典,{}表示您可以通过索引或通过[]循环访问的集合。

我使用键For Each从API返回的初始字典中检索总结果计数。

然后我循环浏览字典(文章)的集合,并提取故事标题和URL。

然后您可以在本地窗口中查看结果或打印出

“本地”窗口中的结果示例:

enter image description here


totalResults

循环:

如果以循环方式进行部署,则可以使用类Option Explicit Public Sub GetStories() Dim articles As Collection, article As Object Dim searchTerm As String, finalResults As Collection, json As Object, arr(0 To 1) Set finalResults = New Collection searchTerm = "Obama" With CreateObject("MSXML2.XMLHTTP") .Open "GET", "https://newsapi.org/v2/everything?q=" & searchTerm & "&apiKey=yourAPIkey", False .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" .send Set json = JsonConverter.ParseJson(.responseText) End With Debug.Print "total results = " & json("totalResults") Set articles = json("articles") For Each article In articles arr(0) = article("title") arr(1) = article("url") finalResults.Add arr Next Stop '<== Delete me later End Sub 来保存XMLHTTP对象。这比创建和销毁更为有效。我为该类提供了一种方法clsHTTP,用于从API检索JSON响应,以及一种GetString方法,用于分析JSON并检索结果计数以及API结果的URL和标题。

“本地”窗口中结果结构的示例:

enter image description here

clsHTTP类:

GetInfo

标准模块:

Option Explicit   
Private http As Object

Private Sub Class_Initialize()
    Set http = CreateObject("MSXML2.XMLHTTP")
End Sub

Public Function GetString(ByVal url As String) As String
    With http
        .Open "GET", url, False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        GetString = .responseText
    End With
End Function

Public Function GetInfo(ByVal json As Object) As Variant
    Dim results(), counter As Long, finalResults(0 To 1), articles As Object, article As Object

    finalResults(0) = json("totalResults")
    Set articles = json("articles")

    ReDim results(1 To articles.Count, 1 To 2)

    For Each article In articles
        counter = counter + 1
        results(counter, 1) = article("title")
        results(counter, 2) = article("url")
    Next

    finalResults(1) = results
    GetInfo = finalResults
End Function

否则:

我将在以下内容中按类别名称获取故事链接。我得到计数​​,然后将链接写到收藏夹中

Option Explicit

Public Sub GetStories()
    Dim http As clsHTTP, json As Object
    Dim finalResults(), searchTerms(), searchTerm As Long, url As String
    Set http = New clsHTTP

    With ThisWorkbook.Worksheets("Sheet1")
        searchTerms = Application.Transpose(.Range("A1:A2")) '<== Change to appropriate range containing search terms
    End With

    ReDim finalResults(1 To UBound(searchTerms))

    For searchTerm = LBound(searchTerms, 1) To UBound(searchTerms, 1)

        url = "https://newsapi.org/v2/everything?q=" & searchTerms(searchTerm) & "&apiKey=yourAPIkey"

        Set json = JsonConverter.ParseJson(http.GetString(url))

        finalResults(searchTerm) = http.GetInfo(json)

        Set json = Nothing

    Next

    Stop '<==Delete me later
End Sub

'

标准Google搜索:

以下内容是标准Google搜索示例,但根据您的搜索字词,您将不一定总是获得相同的HTML结构。您将需要提供一些失败的案例,以帮助我确定是否存在可以应用的一致选择器方法。

Option Explicit

Public Sub GetStories()
    Dim sResponse As String, html As HTMLDocument, articles As Collection
    Const BASE_URL As String = "https://news.google.com/"
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNRGxqTjNjd0VnSmxiaWdBUAE?hl=en-US&gl=US&ceid=US:en", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    Set html = New HTMLDocument: Set articles = New Collection
    Dim numberOfStories As Long, nodeList As Object, i As Long
    With html
        .body.innerHTML = sResponse
        Set nodeList = .querySelectorAll(".VDXfz")
        numberOfStories = nodeList.Length
        Debug.Print "number of stories = " & numberOfStories
        For i = 0 To nodeList.Length - 1
            articles.Add Replace$(Replace$(nodeList.item(i).href, "./", BASE_URL), "about:", vbNullString)
        Next
    End With
    Debug.Print articles.Count
End Sub