我正在尝试在线下载excel文件,只读取包含单词" ORD"的行。
fileUrl <-("http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls")
x <- getURLContent(fileUrl)
out <- read.table(fileUrl,x )
我正在使用GetUrlContent,但在流程的早期阶段收到错误:
警告讯息:
1: In read.table(fileUrl, x) : line 1 appears to contain embedded nulls 2: In read.table(fileUrl, x) : line 2 appears to contain embedded nulls 3: In read.table(fileUrl, x) : line 3 appears to contain embedded nulls 4: In read.table(fileUrl, x) : line 4 appears to contain embedded nulls 5: In read.table(fileUrl, x) : line 5 appears to contain embedded nulls 6: In if (!header) rlabp <- FALSE : the condition has length > 1 and only the first element will be used 7: In if (header) { : the condition has length > 1 and only the first element will be used 8: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input
表&#34; out&#34;出来几乎不可读。有没有人知道如何准确读取特定的行而不是导入整个文件而冒险获取错误行?
答案 0 :(得分:1)
this SO question的答案之一建议使用gdata库从网上下载Excel文件,然后使用read.xls()
将其读入数据框。像这样:
library(gdata)
download.file("http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls", destfile="file.xls")
out <- read.xls("file.xls", header=TRUE, pattern="Some Pattern")
pattern
标记告诉read.xls()
忽略所有内容,直到出现Some Pattern
的第一个行。您可以将值更改为允许您在数据框中所需的实际数据之前跳过初步材料的内容。
答案 1 :(得分:1)
我刚刚找到了解决方案,谢谢蒂姆让我朝着正确的方向前进:
library(gdata)
DownloadURL <- "http://www.hkexnews.hk/reports/sharerepur/documents/SRRPT20151211.xls"
out <- read.xls(DownloadURL, pattern="ORD", perl = "C:\\Perl64\\bin\\perl.exe")