我在mylist中有以下两个数据框。对于每个数据框,我想计算包含最大值(“值”)的组的观察数(由“类型”标识)与另一组的观察数之间的差异。
因此,对于df1,这将是3 - 6 = -3
,因为类型B包含最大值7,类型B有3个观察值,类型A有6个观察值。
value <- c(1, 2, 3, 4, 5, 6, 1, 2, 7)
type <- c("A", "A", "A", "A", "A", "A", "B", "B", "B")
df1 <- data.frame(value, type)
value <- c(1, 2, 3, 4, 6, 1, 2)
type <- c("A", "A", "A", "A", "A", "B", "B")
df2 <- data.frame(value, type)
mylist <- list(df1, df2)
我认为与length(unique())
和max()
结合使用会有以下几点,但我无法理解。
calculation <- lapply(mylist, function (x)
{x$#the count of observations of the type that includes the max value#) - (x$#the count of the observations of the type that does not include the max value)})
答案 0 :(得分:3)
这里的一个技巧是看你的计算可以简化:
[number in group] - [number not in group]
= [number in group] - ([number of rows] - [number in group])
= [number in group] - [number of rows] + [number in group]
= 2 * [number in group] - [number of rows]
所以你可以这样做:
lapply(mylist, function(x) {2*sum(x$type==x$type[which.max(x$value)])-nrow(x)})
返回:
[[1]]
[1] -3
[[2]]
[1] 3
希望这有帮助!
答案 1 :(得分:2)
如果你想分解
lapply(mylist, function(x){
x[,"value"] <- as.numeric(x[,"value"])
MAX_FLAG <- which(x[,"value"] == max(x[,"value"]))[1]
MAX_FLAG <- x[MAX_FLAG,"type"]
A <- length(which(x[,"type"] == "A" ))
B <- length(which(x[,"type"] == "B" ))
BA <- ifelse( MAX_FLAG == "B",B-A,A-B)
return(BA)
}
)
通过一点适应,您可以轻松地拥有超过2组(即:A&amp; B)
希望它会有所帮助,
Gottavianoni
答案 2 :(得分:1)
您还可以使用聚合来计算每组中的观察数量:
calculations <- lapply(mylist, function(df) {
sum_df <- aggregate(value~type, df, FUN = length)
max_type <- df$type[which.max(df$value)]
sum_df$value[sum_df$type == max_type] - sum_df$value[sum_df$type != max_type]
})