Question

我正在尝试编写一个函数来记录脚本中每个连接操作的输入和输出，以便在脚本完成后查看所有操作。这样做，我想确保由于冗余匹配等原因，没有数据帧在流程中膨胀。

到目前为止，我在每个连接操作周围手动添加了一个图层。它说明了我想做的事情。首先，获取输入文件file1，将其与file2一起加入并创建file3。 file3可以是file1的同名名称，也可以是新对象。 checkmerge是文档文件，每次合并操作都会变长。

merge <- "file1+file2=file3"
count <-  nrow(file1)
check_t1 <- data.frame(merge, count)
file3<- join(file1, file2, by = ("firmid", "year"), type = "left")
count <- nrow(file3)
check_t2 <- data.frame(merge, count)
checkmerge <- rbind(checkmerge, check_t1, check_t2)

这适合我。但是，手动进度（a）会产生错误空间，（b）使脚本膨胀并使其难以阅读。所以我想编写一个函数来做到这一点。我是编写函数的新手，但以下是我的方法（不起作用）：

##Initialize checkmerge file

mergedat <- as.character(NULL)
countdat <- as.numeric(NULL)
checkmerge <- data.frame(mergedat, countdat)

##Define function    

fun.docmerge <- function(x, y, z, crit, typ, doc = checkmerge) {
      mergedat <- paste(deparse(substitute(x)), "+",
                        deparse(substitute(y)), "=", z)
      countdat <- nrow(x)
      check_t1 <- data.frame(mergedat, countdat)
      z <- join(x, y, by = crit, type = typ)
      countdat <- nrow(z)
      check_t2 <- data.frame(mergedat, countdat)
      doc <- rbind(doc, check_t1, check_t2)
    }

然后调用以获得与第一个手动方法相同的结果：

fun.docmerge(x = file1, y = file2, z = "file3", crit = c("firmid", "year"), typ = "left")

然而，在调用该函数时，没有任何反应。因此，对象不会更改，我也不会收到错误消息。

如何调整我的功能以复制我之前手动完成的操作？

Answer 1

您的功能有几个问题：在merge <- paste(x, "+", y, "=", z)行中，x和y不是变量名称的字符串，而是数据帧的值。您可以使用以下方式处理：

paste(deparse(substitute(x)), "+",
    deparse(substitute(y)), "=", z)

顺便说一句，merge是R中的基函数，你不应该这样调用新的对象。

此外，行checkmerge <- rbind(checkmerge, check_t1, check_t2)引用此时尚未创建的对象checkmerge。将其替换为rbind(check_t1, check_t2)，或提供checkmerge作为函数的参数。

修改假设您希望函数返回doc，则必须添加return(doc)作为函数中的最后一个语句。

为记录的连接操作编写函数

1 个答案: