Question

我在mylist中有以下两个数据框。对于每个数据框，我想计算包含最大值（“值”）的组的观察数（由“类型”标识）与另一组的观察数之间的差异。

因此，对于df1，这将是3 - 6 = -3，因为类型B包含最大值7，类型B有3个观察值，类型A有6个观察值。

value <- c(1, 2, 3, 4, 5, 6, 1, 2, 7)
type  <- c("A", "A", "A", "A", "A", "A", "B", "B", "B")
df1   <- data.frame(value, type)

value <- c(1, 2, 3, 4, 6, 1, 2)
type  <- c("A", "A", "A", "A", "A", "B", "B")
df2   <- data.frame(value, type)

mylist <- list(df1, df2)

我认为与length(unique())和max()结合使用会有以下几点，但我无法理解。

calculation <- lapply(mylist, function (x) 
{x$#the count of observations of the type that includes the max value#) - (x$#the count of the observations of the type that does not include the max value)})

Answer 1

这里的一个技巧是看你的计算可以简化：

[number in group] - [number not in group]
= [number in group] - ([number of rows] - [number in group])
= [number in group] - [number of rows] + [number in group]
= 2 * [number in group] - [number of rows]

所以你可以这样做：

lapply(mylist, function(x) {2*sum(x$type==x$type[which.max(x$value)])-nrow(x)})

返回：

[[1]]
[1] -3

[[2]]
[1] 3

希望这有帮助！

Answer 2

如果你想分解

lapply(mylist, function(x){ 
  x[,"value"] <- as.numeric(x[,"value"])
  MAX_FLAG <- which(x[,"value"] == max(x[,"value"]))[1]
  MAX_FLAG <- x[MAX_FLAG,"type"]
  A <- length(which(x[,"type"] == "A" ))
  B <- length(which(x[,"type"] == "B" ))
  BA <- ifelse( MAX_FLAG == "B",B-A,A-B)
  return(BA)
 }
)

通过一点适应，您可以轻松地拥有超过2组（即：A＆amp; B）

希望它会有所帮助，

Gottavianoni

Answer 3

您还可以使用聚合来计算每组中的观察数量：

calculations <- lapply(mylist, function(df) {
  sum_df <- aggregate(value~type, df, FUN = length)
  max_type <- df$type[which.max(df$value)]
  sum_df$value[sum_df$type == max_type] - sum_df$value[sum_df$type != max_type]
})

根据组包含的值计算组间观察计数的差异

3 个答案: