这个问题会很长,我提前道歉。希望它仍然很清楚:我正试图在VBA中建立一个刮刀,从Google专利中提取专利引用。
为此,我有两个Excel工作表。在Sheets(“recent_patents”)中存储了一堆专利号:
column A: 1 2 3 4 5 6 7 8 9 10
column B: 6793915 6797730 6805855 6808712 6812240 6815452 6818218 6818642 6824791 6828324
第二张表格(“rec_out”)包含需要填充提取数据的7列(A到G):
A: Progress B: Input patent C: # fwd cites D: fwd_cite E: app_date F: grant_date G: Control Patent
棘手的部分(对我来说)是每个专利被引用的次数不同,所以我必须在innerHTML
和citecount
中构建一个寻找相同数据(fwd_cite,app_date,grant_date)的循环此循环的迭代次数总是不同。
所以我想把问题分成两个不同的函数extractcites
和ccount
。前者打开相关网页并提取前向引用次数并将其存储在变量Sub CountForwardCites
中,而第二个函数则可以循环1到ccount。这两个函数目前都放在Call extractcites
这(我认为)迫使我将 Private Sub CountForwardCites_Click()
' First you need to announce the needed variables
Dim ie As Object
Dim patent As String
Dim ccount As Integer
Dim patent_number As String
Dim the_HTML_To_Parse As String
Dim fwd_cite As String
Dim app_date As String
Dim grant_date As String
Worksheets("recent_patents").Select
'Creating an input row where the data is sourced from
input_row = 1
Do
DoEvents
input_row = input_row + 1
patent_number = Sheets("recent_patents").Range("B" & input_row).Value
Call citecount(patent_number, patent, ccount, fwd_cite, app_date, grant_date)
' Step 1: Repeat patent_number ccount times, add control patent, and ccount
Worksheets("rec_out").Select
sumcount = Sheets("rec_out").Range("B:B").Cells.SpecialCells(xlCellTypeConstants).Count ' this ensures the code does not always start filling up columns from row 2
countrows = sumcount
If ccount > 0 Then
Do
DoEvents
countrows = countrows + 1
Sheets("rec_out").Range("A" & countrows) = Sheets("recent_patents").Range("A" & input_row).Value
Sheets("rec_out").Range("B" & countrows) = patent_number
Sheets("rec_out").Range("C" & countrows) = countrows - sumcount
Sheets("rec_out").Range("G" & countrows) = patent
'Call GetForwardCites(the_HTML_To_Parse, fwd_cte, app_date, grant_date)
' Step 2: Get the patent citations
Call extractcites(patent_number, the_HTML_To_Parse, fwd_cite, app_date, grant_date)
Sheets("rec_out").Range("D" & countrows) = fwd_cite
Sheets("rec_out").Range("E" & countrows) = grant_date
Sheets("rec_out").Range("F" & countrows) = app_date
Loop Until countrows = ccount + sumcount
Else
countrows = countrows + 1
Sheets("rec_out").Range("B" & countrows) = patent_number
Sheets("rec_out").Range("C" & countrows) = ccount
Sheets("rec_out").Range("D" & countrows) = ccount
Sheets("rec_out").Range("E" & countrows) = "NA"
Sheets("rec_out").Range("F" & countrows) = "NA"
End If
Loop Until input_row = 22500
End Sub
置于循环中,这总是给我一个错误。
以下是当前代码:
Function extractcites(patent_number As String, the_HTML_To_Parse As String, fwd_cite As String, app_date As String, grant_date As String)
the_HTML_To_Parse = the_HTML_ToParse ' Transfer of the innerHTML string generated in citecount
ForwardCitesScraper.ScrapedCode.Text = the_HTML_To_Parse
fwd_cite = ""
app_date = ""
grant_date = ""
If patent_number = "" Then Exit Function
' Parse HTML up to "Referenced by</SPAN></DIV>" string is (deletes text before).
' I do this because that tag announces the start of the table I'm interested in
If InStr(the_HTML_To_Parse, "Referenced by</SPAN>") > 0 Then
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "Referenced by</SPAN></DIV>"), Len(the_HTML_To_Parse))
' Extract Forward citation Publication Number
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "/patents/") + 9, Len(the_HTML_To_Parse))
fwd_cite = Mid(the_HTML_To_Parse, 1, InStr(the_HTML_ToParse, "</A></TD>") - 1)
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "-date-value"), Len(the_HTML_To_Parse))
' Extract Forward citation application date
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "-date-value") + 13, Len(the_HTML_To_Parse))
app_date = Mid(the_HTML_To_Parse, 1, InStr(the_HTML_To_Parse, "</TD>") - 1)
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "patent-date-value"), Len(the_HTML_To_Parse))
' Extract Forward citation grant date
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "patent-date-value") + 19, Len(the_HTML_To_Parse))
grant_date = Mid(the_HTML_To_Parse, 1, InStr(the_HTML_To_Parse, "</TD>") - 1)
Else:
' Extract Forward citation Publication Number
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "/patents/") + 9, Len(the_HTML_To_Parse))
fwd_cite = Mid(the_HTML_To_Parse, 1, InStr(the_HTML_ToParse, "</A></TD>") - 1)
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "-date-value"), Len(the_HTML_To_Parse))
' Extract Forward citation application date
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "-date-value") + 13, Len(the_HTML_To_Parse))
app_date = Mid(the_HTML_To_Parse, 1, InStr(the_HTML_To_Parse, "</TD>") - 1)
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "patent-date-value"), Len(the_HTML_To_Parse))
' Extract Forward citation grant date
the_HTML_To_Parse = Mid(the_HTML_To_Parse, InStr(the_HTML_To_Parse, "patent-date-value") + 19, Len(the_HTML_To_Parse))
grant_date = Mid(the_HTML_To_Parse, 1, InStr(the_HTML_To_Parse, "</TD>") - 1)
End If
End Function
步骤1似乎工作正常,因为它确实返回了所需的值。但是,步骤2总是在“无效的过程调用”中失败。
citecount
为了完整性,我还为Function citecount(patent_number As String, patent As String, ccount As Integer, fwd_cite As String, app_date As String, grant_date As String, the_HTML_ToParse)
patent = ""
ccount = 0
If patent_number = "" Then Exit Function
the_start:
' Open Browser
Set ie = CreateObject("InternetExplorer.Application")
'Set ie = New InternetExplorerMedium
ie.Top = 0
ie.Left = 0
ie.Width = 800
ie.Height = 600
ie.AddressBar = 0
ie.StatusBar = 0
ie.Toolbar = 0
ie.Visible = False 'If False we won't see the window navigation
On Error Resume Next
' Open correct webpage based on query of patent_number
ie.Navigate ("http://www.google.com/patents/US" & patent_number & "?")
' Wait for the page to load
Do
DoEvents
If Err.Number <> 0 Then
ie.Quit
Set ie = Nothing
GoTo the_start:
End If
Loop Until ie.ReadyState = 4
' Copy the HTML results into an object we can use
the_HTML_ToParse = ie.Document.Body.InnerHTML
the_HTML_ToParse = Mid(the_HTML_ToParse, InStr(the_HTML_ToParse, "<TABLE class=patent-bibdata>"), Len(the_HTML_ToParse))
' Extract Patent Publication Number (correctness check)
the_HTML_ToParse = Mid(the_HTML_ToParse, InStr(the_HTML_ToParse, "<TD class=patent-bibdata-heading>Publication number</TD>") + 90, Len(the_HTML_ToParse))
pat_n = Mid(the_HTML_ToParse, 1, InStr(the_HTML_ToParse, "</TD></TR>") - 1)
patent = pat_n
If InStr(the_HTML_ToParse, "Referenced by</A> (") > 0 Then
' Extract ccount
ForwardCitesScraper.ScrapedCode.Text = the_HTML_ToParse
the_HTML_ToParse = Mid(the_HTML_ToParse, InStr(the_HTML_ToParse, "Referenced by</A> (") + 19, Len(the_HTML_ToParse))
ref_by = Mid(the_HTML_ToParse, 1, InStr(the_HTML_ToParse, "),</SPAN>") - 1)
ForwardCitesScraper.ScrapedCode.Text = the_HTML_ToParse
ccount = ref_by
Else: ccount = 0
MsgBox "no fwd cites"
End If
ie.Quit
Set ie = Nothing
End Function
the_HTML_ToParse = the_HTML_To_Parse ' No string seems to be transferred
好的,这就是全部。 我个人认为问题出在其中
希望无论如何都可以提出任何建议。我不到一个星期前就开始了VBA,请原谅我,如果这个问题真的很愚蠢。
西蒙