输入API的URL时,我会在.gz下载中获得.csv文件。 TAB分离的.csv文件中的数据如下所示(前六行):
00031483 "youtube.com" "2015-05-18 13:31:06" 26
00031483 "youtube.com/channel/UCgfzulPx4-ef6oOX6Jj6f9" "2015-05-18 13:31:32" 16
00031483 "youtube.com/watch?v=Qj8dEBHzev" "2015-05-18 13:31:48" 16
00031483 "youtube.com/my_videos?o=" "2015-05-18 13:32:04" 5
00031483 "youtube.com/my_videos?o=" "2015-05-18 13:32:09" 9
00031483 "youtube.com/edit?o=U&video_id=e-Yn8BTZTx" "2015-05-18 13:32:18" 40
当我尝试从R中的API自动读取文件时,例如:
read.table(PageViewsURL, sep="\t", quote='"', header = FALSE)
我收到以下错误消息:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 2 elements
In addition: Warning messages:
1: In read.table(PageViewsURL, sep = "\t", quote = "\"", header = FALSE) :
line 1 appears to contain embedded nulls
2: In read.table(PageViewsURL, sep = "\t", quote = "\"", header = FALSE) :
line 5 appears to contain embedded nulls
下载文件并通过以下方式解压缩:download.file(PageViewsURL, paste("E:/Pageviews_", substr(GMTtime,1,10), ".gz", sep=""))
向我发送了一个包含没有扩展名的文件的.gz文件,这使我在阅读时遇到同样的问题。
有没有人知道怎么读文件? 它很大,667464行。
遗憾的是,由于隐私问题,我无法共享API网址或文件。由于这个原因,我也改变了给定的URL。