我正在使用RSelenium软件包从网站上下载一些数据,如下所示:
library(RSelenium)
rD = rsDriver()$client
rD$navigate('https://www.bseindia.com/corporates/ann.aspx?expandable=3')
rD$executeScript(paste("document.getElementById('ctl00_ContentPlaceHolder1_txtDate').value = '04/09/2018';"), list('Dummy'))
rD$executeScript("document.getElementById('ctl00_ContentPlaceHolder1_imgSubmit').click();", args = list('Dummy'))
Data = strsplit(rD$findElement(using = 'id', "ctl00_ContentPlaceHolder1_lblann")$getElementText()[[1]], "\n")[[1]]
但是,不幸的是,我的代码无法提取该网站上可用的各种PDF文件的链接。例如,当我查看源HTML代码时,无法获得以下链接:
<span id="ctl00_ContentPlaceHolder1_lblann"><table cellpadding='4' cellspacing='1' width='100%' border='0'><tr><td class='announceheader' style='font-weight:bold; color:#ffffff' align='left' colspan='4'>04 Sep 2018</td></tr><tr><td class='TTHeadergrey' style='font-weight:bold;' valign='middle'>Infibeam Avenues Ltd - 539807 - Announcement under Regulation 30 (LODR)-Appointment of Statutory Auditor/s</td><td class='TTHeadergrey'> </td><td class = 'TTHeadergrey' valign='middle'><a class='tablebluelink' href = 'https://www.bseindia.com/xml-data/corpfiling/AttachHis/b64b5834-093e-4147-a45d-b14ca89fa330.pdf' target = '_blank'>
将PDF文件上的信息提取到Data
所需的任何帮助将非常有帮助。