Question

我是R社区的新人。编写我的第一个程序，我遇到了一个愚蠢的问题！尝试使用以下代码读取RDS文件时：

tweets <- readRDS("RDataMining-Tweets-20160212.rds")

将出现以下错误。

Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file 'RDataMining-Tweets-20160212.rds', probable reason 'No such file or directory'

这里有什么问题？

Answer 1

尝试以这种方式打开文件，避免与文件路径相关的任何可能的问题：

tweets <- readRDS(file.choose())

将打开一个交互式窗口，您可以选择文件。

Answer 2

由于我们无权访问您的文件，因此很难确切知道，所以让我举几个其他类型文件可能会给出的示例你。

首先，一些文件：

ctypes <- list(FALSE, 'gzip', 'bzip2', 'xz')
saverds_names <- sprintf('saveRDS_%s.rds', ctypes)
save_names <- sprintf('save_%s.rda', ctypes)
ign <- mapply(function(fn,ct) saveRDS(mtcars, file=fn, compress=ct),
              saverds_names, ctypes)
ign <- mapply(function(fn,ct) save(mtcars, file=fn, compress=ct),
              save_names, ctypes)
str(lapply(saverds_names, function(fn) system2("file", fn, stdout=TRUE)))
# List of 4
#  $ : chr "saveRDS_FALSE.rds: data"
#  $ : chr "saveRDS_gzip.rds: gzip compressed data, from HPFS filesystem (OS/2, NT)"
#  $ : chr "saveRDS_bzip2.rds: bzip2 compressed data, block size = 900k"
#  $ : chr "saveRDS_xz.rds: XZ compressed data"
str(lapply(save_names, function(fn) system2("file", fn, stdout=TRUE)))
# List of 4
#  $ : chr "save_FALSE.rda: data"
#  $ : chr "save_gzip.rda: gzip compressed data, from HPFS filesystem (OS/2, NT)"
#  $ : chr "save_bzip2.rda: bzip2 compressed data, block size = 900k"
#  $ : chr "save_xz.rda: XZ compressed data"

常见（unix-y）实用程序是file，它使用文件签名来确定可能的文件类型。（如果你在Windows上，它通常与Rtools一起安装，所以在那里寻找它。如果Sys.which("file")为空，那么环顾四周你安装了Rtools的地方，类似于c:\Rtools\bin\file.exe。）< / p>

Sys.which('file')
#                        file 
# "c:\\Rtools\\bin\\file.exe"

有了这个，让我们看看file认为这些文件可能是什么：

str(lapply(saverds_names, function(fn) system2("file", fn, stdout=TRUE)))
# List of 4
#  $ : chr "saveRDS_FALSE.rds: data"
#  $ : chr "saveRDS_gzip.rds: gzip compressed data, from HPFS filesystem (OS/2, NT)"
#  $ : chr "saveRDS_bzip2.rds: bzip2 compressed data, block size = 900k"
#  $ : chr "saveRDS_xz.rds: XZ compressed data"
str(lapply(save_names, function(fn) system2("file", fn, stdout=TRUE)))
# List of 4
#  $ : chr "save_FALSE.rda: data"
#  $ : chr "save_gzip.rda: gzip compressed data, from HPFS filesystem (OS/2, NT)"
#  $ : chr "save_bzip2.rda: bzip2 compressed data, block size = 900k"
#  $ : chr "save_xz.rda: XZ compressed data"

帮助一点。如果您没有返回这四个字符串中的一个，那么您可能会查看损坏的文件（或错误命名的文件，即我们期望的不是.rds格式）。

如果确实返回其中一个，则知道readRDS（前四个）和load（后四个）将自动确定要使用的compress=参数，这意味着该文件很可能已损坏（或其他形式的压缩数据;再次，可能错误命名）。

相反，其他一些文件类型会返回：

system2("file", "blank.accdb")
# blank.accdb: raw G3 data, byte-padded
system2("file", "Book1.xlsx")
# Book1.xlsx: Zip archive data, at least v2.0 to extract
system2("file", "Book1.xls")
# Book1.xls: OLE 2 Compound Document
system2("file", "j.fra.R") # code for this answer
# j.fra.R: ASCII text, with CRLF line terminators

（with CRLF是一个Windows-y的东西。*叹气*）最后一个也是CSV和类似的基于文本的表格文件等的情况。

在我看来，@ divibisan建议文件可能已损坏是最可能的罪魁祸首，但它可能会提供不同的输出：

file.size(saverds_names[1])
# [1] 3798
head(readRDS(rawConnection(readBin(saverds_names[1], what=raw(1)))), n=2)
#               mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
# Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

但截断文件中的不完整数据看起来有所不同：我截断了文件（外部带有dd），我收到的错误是"Error in readRDS: error reading from connection\n"。

查看source for R，该错误字符串仅出现在R_gzread中，表明R认为该文件是使用"gzip"压缩的（这是默认值，也许是因为它不能肯定地确定任何其他明显的压缩方法。）

这不是一个很好的答案，但它可能会让你对可能出错的东西有所了解。不幸的是，最重要的是，它不太可能从损坏的数据中恢复任何数据。

Answer 3

网站上有两个支持您所关注书籍的rds文件：

http://www.rdatamining.com/data/

其中一个名为：RDataMining-Tweets-20160212.rds，另一个名为：getwd()。我怀疑你使用浏览器将它们放在“downloads”文件夹中，它恰好是不同于你执行时会看到的文件夹：

 list.files()

您应该尝试从该网站重新下载并检查文件位置。

您可以使用以下命令获取当前工作目录中的文件列表：

tweets2 <- readRDS( url("http://www.rdatamining.com/data/RDataMining-Tweets-20160212.rds?attredirects=0&d=1") )
str(tweets2)  # should be a complex and long output

确保文件名出现在该输出中。

如果证明这很困难，那么这也应该成功：

url

{{1}}函数创建一个可用于下载RDS文件的“连接”

Answer 4

轻笑 - 我认识到数据文件名......

您似乎使用了赵延昌关于“Text Mining with R”的演示文稿中的参考数据，该数据集可以通过作者网站[http://www.rdatamining.com/data]使用此链接下载 - {{{3} }]

查看错误，正如Devyani Balyan所说，您尝试加载的文件可能不在您的工作目录中。代码示例通常将数据文件放在<project>\data结构中。延昌在他的例子中遵循了这种启发式。

我建议您获得一本好R书的副本，以帮助您入门。我的个人推荐是迈克尔·克劳利的“The R Book”。

这是一个有效的代码示例，我希望它可以帮助您取得进展：

代码示例

# used to load multiple R packages at once
# and or install them.

ipak <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

# Load twitterR Package
packages <- c("twitteR")
ipak(packages)

# Load Example dataset from the website

url <- "http://www.rdatamining.com/data/RDataMining-Tweets-20160212.rds"
destfile = "RDataMining-Tweets-20160212.rds"

# Download into the current directory for simplicity
# Normally I would recommend saving to <project>/data

# Check if file exists if not download it...
if (!file.exists(destfile)) {
  download.file(url, destfile)
}

## load tweets into R
tweets <- readRDS(destfile)
head(tweets, n = 3)

控制台输出

> ipak <- function(pkg){
+   new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
+   if (length(new.pkg))
+     install.packages(new.pkg, dependencies = TRUE)
+   sapply(pkg, require, character.only = TRUE)
+ }
> # Load twitterR Package
> packages <- c("twitteR")
> ipak(packages)
twitteR 
   TRUE 
> url <- "http://www.rdatamining.com/data/RDataMining-Tweets-20160212.rds"
> destfile = "RDataMining-Tweets-20160212.rds"
> # Check if file exists if not download it...
> if (!file.exists(destfile)) {
+   download.file(url, destfile)
+ }
> ## load tweets into R
> tweets <- readRDS(destfile)
> head(tweets, n = 3)
[[1]]
[1] "RDataMining: A Twitter dataset for text mining: @RDataMining Tweets extracted on 3 February 2016. Download it at **********"

[[2]]
[1] "RDataMining: Vacancy of Data Scientist – Healthcare Analytics, Adelaide, Australia\n***********"

[[3]]
[1] "RDataMining: Thanks to all for your ongoing support to ******** 3. Merry Christmas and Happy New Year!"

Answer 5

首先通过getwd（）检查您的工作目录。检查是否要在此目录中以红色显示该文件。如果不使用setwd(dir=<#pathwhere you want to change it>)。

我认为你不应该在双引号中使用readRDS作为扩展名.rds已经存在。而是使用文件等效为readRDS(File="RDataMining-Tweets-20160212.rds")

在R中读取文件时出错

5 个答案:

代码示例

控制台输出