使用RVest从网站刮取表格

时间:2019-09-30 12:57:55

标签: r

我正试图从财政部网站上刮掉桌子。

https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldYear&year=2019

我目前正在收集数据,但是

library("rvest")
url <- "https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yieldAll"

data <- url %>%
  html()

但是我似乎无法将其转换为表格格式,因为我有一个函数。

data %>%
html_table()

1 个答案:

答案 0 :(得分:1)

最好先使用CSS来定位包含表的节点。该表很大(大约7400行)。使用Dim StartDate As Date, EndDate As Date Dim MainWorksheet As Worksheet, NewWorkSheet As Worksheet StartDate = Sheets("NoEntry").Range("L15").Value EndDate = Sheets("NoEntry").Range("L16").Value Set MainWorksheet = Worksheets("Data") With MainWorksheet ' SORT RANGE .Range("G1").CurrentRegion.Sort key1:=.Range("F1"), order1:=xlAscending, Header:=xlYes Set NewWorkSheet = Worksheets.Add(after:=Worksheets(Worksheets.Count)) With .Range("$A:$G") ' SORT RANGE .AutoFilter Field:=7, Criteria1:=">=" & StartDate, Operator:=xlAnd, _ Criteria2:="<=" & EndDate ' COPY VISIBLE AND NON-BLANK CELLS TO NEW WORKSHEET Application.Intersect(.SpecialCells(xlCellTypeVisible), _ .SpecialCells(xlCellTypeConstants)).Copy _ Destination:=NewWorkSheet.Range("A1") End With ' REMOVE FILTER .Cells.AutoFilter End With Sheets("NoEntry").Activate Set MainWorksheet = Nothing: Set NewWorkSheet = Nothing 进行渲染花费了30秒。

html_table