Question

我想知道R中是否有东西允许我更新文件而不是保存所有数据。

也许有类似sqldf::read.csv.sql的东西可以保存。

行

假设我将虹膜数据存储为.csv：

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa

但我已经意识到，第二朵花是维吉尼卡，所以我想改变第二行：

2          4.9         3.0          1.4         0.2  virginica

我知道，我可以读取文件，更改Species，然后再次保存，但是我的文件中有更多行（即> 10 ⁶），此方法的效率越低

Answer 1

通常，R实际上并不适用于就地文件编辑，我知道在任何上下文中都没有（当前可用的）工具支持它。即使像sed这样的unixy工具也会进行快速编辑，但仍然不能在技术上“就地”进行编辑（即使它隐藏了它的工作原理）。（可能会有一些这样做，但可能没有您想要的易用性。）

有一个值得注意的例外，设计的文件格式用于就地编辑（好吧，交互）。它包括重要的就地添加，过滤，替换和删除操作符。在大多数情况下，它通常会这样做，而不需要在这样做时增加文件大小。这是 SQLite 。

例如，

library(DBI)
# library(RSQLite) # don't need to load it, just need to have it available
fname <- "./iris.sqlite3"
con <- dbConnect(RSQLite::SQLite(), fname)
file.info(fname)$size
# [1] 0
dbWriteTable(con, "iris", iris)
# [1] TRUE
file.info(fname)$size
# [1] 16384
dbGetQuery(con, "select * from iris where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          4.7         3.2          1.6         0.2  setosa
file.info(fname)$size
# [1] 16384
dbExecute(con, "update iris set [Species]='virginica' where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
# [1] 1
dbGetQuery(con, "select * from iris where [Sepal.Length]=4.7 and [Sepal.Width]=3.2 and [Petal.Length]=1.6")
#   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 1          4.7         3.2          1.6         0.2 virginica
dbDisconnect(con)
file.info(fname)$size
# [1] 16384

赞成

是cross-platform。它比大多数人意识到的更加丰富，是Firefox浏览器和Android操作系统所必需的内部组件。（Many others也是。）
此外，驱动程序存在于大多数编程语言中，包括R，python，ruby，以及许多在此列出的太多。
对于可以存储在单个SQLite文件中的数据量，实际上没有实际限制。它理论上支持高达140TB（https://www.sqlite.org/whentouse.html），但是如果你得到这么大，那么对于不同的解决方案有很多（合理的）论据。
拉取数据是建立在SQL标准之上的，尽管它不是100％兼容的，但它是pretty darn close。查询时间/性能取决于您的查询大小，但通常非常快（参考：Will SQLite performance degrade if the database size is greater than 2 gigabytes?）
实际上，它可以是faster，而不是单个文件操作。

缺点

文件大小会有“开销”。值得注意的是，iris占用的内存不足7K（参见object.size(iris)），但文件大小从16K开始。对于较大的数据，间隙比率（文件大小与实际数据）将缩小。（我对ggplot2::diamonds做了同样的事情;对象是3456376字节，文件大小是3780608，小于10％。）
当SQLite认为必要时，文件大小会增加。这是基于R范围以外的许多因素和这个问题/答案。
如果删除大量数据，文件大小不会立即减少以适应...请参阅change sqlite file size after "DELETE FROM table"（提示：vacuum）
有许多工具可以轻松/立即从这种文件格式导入数据，但显然不存在Excel和Access。使用SQLite-ODBC是可行的，但要做一点肘部油脂。（我很喜欢它，但不是所有用户都会这样做，而且一些企业网络会使这一步骤变得困难或特别不允许。）

SQLite的文件-AS-CSV

如果要导入所有内容，可以将其视为导入时的文件：

con <- dbConnect(RSQLite::SQLite(), fname)
iris2 <- dbGetQuery(con, "select * from iris")
dbDisconnect(con)

与

相比

iris2 <- read.csv("iris.csv", stringsAsFactors = FALSE)

如果你想获得幻想：

import_sqlite <- function(fname, tablename = NA) {
  if (length(tablename) > 1L) {
    warning("the condition has length > 1 and only the first element will be used")
    tablename <- tablename[[1L]]
  }
  con <- DBI::dbConnect(RSQLite::SQLite(), fname)
  on.exit(DBI::dbDisconnect(con), add = TRUE)
  available_tables <- DBI::dbListTables(con)
  if (length(available_tables) == 0L) {
    stop("no tables found")
  } else if (is.na(tablename)) {
    if (length(available_tables) == 1L) {
      tablename <- available_tables
    }
  }
  if (tablename %in% available_tables) {
    tablename <- DBI::dbQuoteIdentifier(con, tablename)
    qry <- sprintf("select * from %s", tablename)
    out <- tryCatch(list(data = DBI::dbGetQuery(con, DBI::SQL(qry)),
                         err = NULL),
                    error = function(e) list(data = NULL, err = e))
    if (! is.null(out$err)) {
      stop("[sqlite error] ", out$err$message)
    } else {
      return(out$data)
    }    
  } else {
    stop(sprintf("table %s not found", DBI::dbQuoteIdentifier(con, tablename)))
  }
}
head(import_sqlite("iris.sqlite3"))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

（我不提供除了概念验证之外的任何功能，您可以将其与单个文件进行交互，就好像它是一个CSV。有一些保护措施，但实际上只是一个黑客攻击这个问题。）

仅更改并保存文件中的一行

1 个答案:

赞成

缺点

SQLite的文件-AS-CSV