Question

我在列表中有data.frame对象。但是，我打算通过将其得分与给定的阈值进行比较来拆分myList。特别是，我想让我的函数只返回分数大于阈值的data.frame，同时我将一个小于阈值的值导出为csv文件（因为我将进一步处理保存的data.frame，而导出的data.frame将最后在摘要中列出。）我知道首先拆分data.frame会更容易，然后将它们导出为csv文件，并进一步处理所需的数据。但我想在一个包装函数中轻松实现这一点。任何人都可以指出我如何更有效地促进我的功能输出？有什么想法吗？

迷你示例：

mylist <- list(
  foo=data.frame( from=seq(1, by=4, len=16), to=seq(3, by=4, len=16), score=sample(30, 16)),
  bar=data.frame( from=seq(3, by=7, len=20), to=seq(6, by=7, len=20), score=sample(30, 20)),
  cat=data.frame( from=seq(4, by=8, len=25), to=seq(7, by=8, len=25), score=sample(30, 25)))

我打算像这样分开他们：

func <- function(list, threshold=16, ...) {
  # input param checking
  stopifnot(is.numeric(threshold))
  reslt <- lapply(list, function(elm) {
    res <- split(x, c("Droped", "Saved")[(x$score > threshold)+1])
    # FIXME : anyway to export Droped instance while return Saved
  })
}

在我的sketch功能中，我打算将每个data.frame中的Droped实例导出为csv文件，同时从每个data.frame返回Saved实例作为输出，并将其用于进一步处理。

我试图在我的功能中实现这一点，但我的方法在这里效率不高。任何人都可以指出我如何轻松实现这一目标吗？有没有人知道这样做有什么用处，以更优雅地提示我的预期输出？提前致谢。

Answer 1

您可以将这两个流程转为lapply的调用，如下所示：

# function to perform both tasks on one data frame in mylist
splitter <- function(i, threshold) {

  require(dplyr)

  DF <- mylist[[i]]

  DF %>%
    filter(score <= threshold) %>%
    write.csv(., sprintf("dropped.%s.csv", i), row.names = FALSE)

  Saved <- filter(DF, score > threshold)

  return(Saved)

}

# now use the function to create a new list, my list2, with the Saved 
# versions as its elements. the csvs of the dropped rows will get created
# as this runs. 
mylist2 <- lapply(seq_along(mylist), function(i) splitter(i, 16))

如何在列表中拆分data.frame时方便输出？

1 个答案: