Question

我有data.frame列表作为我的自定义函数的输入，我希望自定义函数返回多个data.frame列表。我对我的函数进行了一些代码更改，但它返回了意外的输出。任何人都可以建议我如何改进自定义函数中的代码？我的代码出错的地方？任何提示？

数据

myList <- list(
  foo = data.frame( start=seq(1, by=4, len=6), stop=seq(3, by=4, len=6)),
  bar = data.frame(start=seq(5, by=2, len=7), stop=seq(7, by=2, len=7)),
  bleh = data.frame(start=seq(1, by=5, len=5), stop=seq(3, by=5, len=5))
)

自定义功能需要优化：

func <- function(set) {
  # check input param
  stopifnot(is.list(set))
  output <- list()
  require(dplyr)
  for(id in 1: seq_along(set)) {
    entry <- set[[id]]
    self_ <- setdiff(entry, entry)
    res <- lapply(set[-id], function(ele_) {
      joined_ <- setdiff(entry, ele_)
    })
    ans <- c(list(self_), res)
    names(ans) <- c(names(set[id]),names(set[-id]))
    output[id] <- ans
  }
  return(output)
}

期望的输出

我希望我的自定义函数会返回多个data.frame对象列表。任何人都可以给我一些想法吗？感谢

Answer 1

我仍然在理解你的意图时遇到一些麻烦，但这是一个更清洁解决方案的建议。

首先，将数据存储为平面数据框通常要容易得多：

library(plyr)
df <- ldply(df.list, rbind, .id = 'group1')

   group1 V1 V2
1       a  1  1
2       a  1  0
3       a  1  4
4       a  2  5
...   
18      c  4  3

然后我们可以使用plyr遍历两组的组合并计算它们的集合差异：

df.setdiff <- ddply(df, .(group1), function(x) {
    comparisons <- subset(df, group1 != x$group1[1])
    colnames(comparisons) <- c('group2', 'V1', 'V2')
    res <- ddply(comparisons, .(group2), function(y) {
        return(setdiff(x[c('V1', 'V2')], y[c('V1', 'V2')]))
    })
})

这会生成一个数据框：

   group1 group2 V1 V2
1       a      b  1  1
2       a      b  1  0
3       a      b  1  4
4       a      b  2  5
5       a      b  3  0
6       a      b  0  2
7       a      c  1  4
8       a      c  2  5
9       a      c  3  0
10      a      c  0  2
...
24      c      b  0  3

某些比较会出现两次，因为每个组都可以出现在“group1”或“group2”列中，而且我的代码不会跳过这些重复，但这应该可以让您开始。

如何有效地迭代data.frame列表作为自定义函数的输入？

1 个答案: