我一直在尝试从RStudio连接sharepoint列表。
你能帮帮我吗?
似乎R-Odata 包仅限于csv文件。
答案 0 :(得分:0)
我现在无法访问任何SharePoint资源,而且我还没有相当长的时间,但我猜测您可以像使用任何基于HTML的技术一样摄取HTML。
library(rvest)
webpage <- read_html("http://www.bls.gov/web/empsit/cesbmart.htm")
tbls <- html_nodes(webpage, "table")
head(tbls)
## {xml_nodeset (6)}
## [1] <table id="main-content-table"> \n\t<tr> \n\t\t<td id="secon ...
## [2] <table id="Table1" class="regular" cellspacing="0" cellpadding="0" x ...
## [3] <table id="Table2" class="regular" cellspacing="0" cellpadding="0" x ...
## [4] <table id="Table3" class="regular" cellspacing="0" cellpadding="0" x ...
## [5] <table id="Table4" class="regular" cellspacing="0" cellpadding="0" x ...
## [6] <table id="Exhibit1" class="regular" cellspacing="0" cellpadding="0" ...
则...
tbls_ls <- webpage %>%
html_nodes("table") %>%
.[3:4] %>%
html_table(fill = TRUE)
str(tbls_ls)
也...
# empty list to add table data to
tbls2_ls <- list()
# scrape Table 2. Nonfarm employment...
tbls2_ls$Table1 <- webpage %>%
html_nodes("#Table2") %>%
html_table(fill = TRUE) %>%
.[[1]]
# Table 3. Net birth/death...
tbls2_ls$Table2 <- webpage %>%
html_nodes("#Table3") %>%
html_table() %>%
.[[1]]
str(tbls2_ls)
...最后
head(tbls2_ls[[1]], 4)
## CES Industry Code CES Industry Title Benchmark Estimate Differences NA
## 1 Amount Percent <NA> <NA> NA <NA>
## 2 00-000000 Total nonfarm 137,214 137,147 67 (1)
## 3 05-000000 Total private 114,989 114,884 105 0.1
## 4 06-000000 Goods-producing 18,675 18,558 117 0.6
# remove row 1 that includes part of the headings
tbls2_ls[[1]] <- tbls2_ls[[1]][-1,]
# rename table headings
colnames(tbls2_ls[[1]]) <- c("CES_Code", "Ind_Title", "Benchmark",
"Estimate", "Amt_Diff", "Pct_Diff")
head(tbls2_ls[[1]], 4)
## CES_Code Ind_Title Benchmark Estimate Amt_Diff Pct_Diff
## 2 00-000000 Total nonfarm 137,214 137,147 67 (1)
## 3 05-000000 Total private 114,989 114,884 105 0.1
## 4 06-000000 Goods-producing 18,675 18,558 117 0.6
## 5 07-000000 Service-providing 118,539 118,589 -50 (1)
library(XML)
url <- "http://www.bls.gov/web/empsit/cesbmart.htm"
# read in HTML data
tbls_xml <- readHTMLTable(url)
typeof(tbls_xml)
## [1] "list"
length(tbls_xml)
## [1] 15
有关详细信息,请参阅以下链接。
http://bradleyboehmke.github.io/2015/12/scraping-html-tables.html
https://statistics.berkeley.edu/computing/r-reading-webpages
http://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/webscrape.html