Excel VBA Web Scraper;用引号提取的文本

时间:2014-05-13 09:16:54

标签: excel vba excel-vba

Hello Stackoverflow社区。

我在VBA中创建了一个Web抓取工具,以获取Yahoo!股票报价的行业/行业数据财务网站到我的excel文件。问题是,虽然我有单元格值,例如技术,但真正的文字是“技术”(发现当我复制并且Ctrl-F时文本)。 VLOOKUP或SUMIF等功能正在返回#N / A,因为它与另一个工作表上的单元格值技术一起使用。如果有人可以帮我解决这个问题,我真的很感激,好像我自己找不到解决办法一样。宏在下面。

Sub SectInd()

Sheet1.Activate
Set browser = CreateObject("InternetExplorer.Application")
browser.Visible = False

Dim Lastr As Integer: Lastr = Sheet1.Range("A5000").End(xlUp).Row


If Lastr > 2 Then

For a = 3 To Lastr
Dim Quote As String: Quote = Sheet1.Cells(a, 1).Value
Dim URL As String: URL = "http://finance.yahoo.com/q/in?s=" & Quote & "+Industry"

the_start:

browser.Navigate (URL)

Do
DoEvents
If Err.Number <> 0 Then
browser.Quit
Set browser = Nothing
GoTo the_start:
End If

Loop Until browser.ReadyState = 4

WebText = browser.Document.Body.InnerText
If InStr(WebText, "Sector:") > 0 Then
WebText2 = Mid(WebText, InStr(WebText, "Sector:"), 100)
TextSector = Split(WebText2, Chr(10))(1)
TextIndustry = Split(WebText2, Chr(10))(4)
End If
Sheet1.Cells(a, 4).Value = TextSector
Sheet1.Cells(a, 5).Value = TextIndustry


Next a
End If

Sheet1.Cells.Columns.AutoFit
End Sub

P.S。 Debug.Print还返回没有引号的文本(即技术,而不是“技术”

2 个答案:

答案 0 :(得分:0)

我已经运行了你的代码。而且我不知道你在哪里得到双引号所以它看起来很好。

如果您想查看引号是否实际上是字符串的一部分,请尝试检查长度。字符串“cat”如果有引号则有5个字母,如果IDE添加任何可见引号,则只有3个字母,以说明它是一个字符串。

所以使用Len(TextSector)来查找字符串的长度。

答案 1 :(得分:0)

几年前我做了类似的事情,但我使用了msxml2.xmlhttp而不是IE对象。也许这段代码可以帮到你。

Public Declare Function DeleteUrlCacheEntry Lib "Wininet.dll" _
Alias "DeleteUrlCacheEntryA" _
(ByVal lpszUrlName As String) As Long

Public Function getGoogPrice(symbol As String) As Variant
    Dim xmlhttp As Object
    Dim strURL As String
    Dim CompanyID As String
    Dim X As String, Y As Variant
    Dim sSearch As String
    strURL = "http://www.google.com/finance?q=" & symbol
    DeleteUrlCacheEntry (strURL)
    Set xmlhttp = CreateObject("msxml2.xmlhttp")
    With xmlhttp
        .Open "get", strURL, False
        .send
        X = .responseText
    End With
    symbol = UCase(symbol)
    Set xmlhttp = Nothing
    getGoogPrice = Split(Split(X, symbol)(UBound(Split(X, symbol))), """,""")(1)
End Function

Public Function getReutersPrice(symbol As String) As Variant
    Dim xmlhttp As Object
    Dim strURL As String
    Dim CompanyID As String
    Dim X As String
    Dim sSearch As String, myDIV As String, myPrice As String

    strURL = "http://www.reuters.com/finance/stocks/overview?symbol=" & symbol 'NESN.VX"
    DeleteUrlCacheEntry (strURL)
    Set xmlhttp = CreateObject("msxml2.xmlhttp")
    With xmlhttp
        .Open "get", strURL, False
        .send
        X = .responseText
    End With
    Set xmlhttp = Nothing
    sSearch = "sectionQuoteDetail"
    myDIV = Mid(X, InStr(1, X, sSearch) + Len(sSearch))
    myDIV = Trim(Mid(myDIV, 1, InStr(1, myDIV, "</div>") - 1))
    Y = Split(myDIV, "</span>")
    myPrice = Mid(Y(1), InStrRev(Y(1), ">") + 1)
    myPrice = Replace(myPrice, Chr(13), "")
    myPrice = Trim(Replace(myPrice, Chr(9), ""))
    getReutersPrice = myPrice
End Function