Question

我有一组.csv文件，每个文件都包含相同数量的行和列。每个文件包含一些以A，B，C为特征的测试对象的观察结果（列'值'），其形式类似于以下形式：

A B C value
1 1 1 0.5
1 1 2 0.6
1 2 1 0.1
1 2 2 0.2
. . . .

假设每个文件都被读入一个单独的数据框。将这些数据帧组合成单个数据帧的最有效方法是什么，其中'value'列包含对于给定测试主题的一些函数调用所有'值'行的结果。 A，B和C列在所有文件中都是常量，可以视为这些观察的关键。

感谢您的帮助。

Answer 1

这应该很简单，假设文件都以相同的方式排序：

dflist <- lapply(dir(pattern='csv'), read.csv)
# row means:
rowMeans(do.call('cbind', lapply(dflist, `[`, 'value')))
# other function `myfun` applied to each row:
apply(do.call('cbind', lapply(dflist, `[`, 'value')), 1, myfun)

Answer 2

如果密钥可能处于任何顺序，或者可能缺失，则这是另一种解决方案：

n <- 10  # of csv files to create
obs <- 10  # of observations
# create test files
for (i in 1:n){
    df <- data.frame(A = sample(1:3, obs, TRUE)
                , B = sample(1:3, obs, TRUE)
                , C = sample(1:3, obs, TRUE)
                , value = runif(obs)
                )
    write.csv(df, file = tempfile(fileext = '.csv'), row.names = FALSE)
}


# read in the data
input <- lapply(list.files(tempdir(), "*.csv", full.names = TRUE)
    , function(file) read.csv(file)
    )

# put dataframe together and the compute the mean for each unique combination
# of A, B & C assuming that they could be in any order.
input <- do.call(rbind, input)
result <- lapply(split(input, list(input$A, input$B, input$C), drop = TRUE)
    , function(sect){
        sect$value[1L] <- mean(sect$value)
        sect[1L, ]
    }
)

# create output DF
result <- do.call(rbind, result)
result

组合并聚合多个data.frames

2 个答案: