根据组包含的值计算组间观察计数的差异

时间:2018-04-19 11:32:07

标签: r list count max lapply

我在mylist中有以下两个数据框。对于每个数据框,我想计算包含最大值(“值”)的组的观察数(由“类型”标识)与另一组的观察数之间的差异。

因此,对于df1,这将是3 - 6 = -3,因为类型B包含最大值7,类型B有3个观察值,类型A有6个观察值。

value <- c(1, 2, 3, 4, 5, 6, 1, 2, 7)
type  <- c("A", "A", "A", "A", "A", "A", "B", "B", "B")
df1   <- data.frame(value, type)

value <- c(1, 2, 3, 4, 6, 1, 2)
type  <- c("A", "A", "A", "A", "A", "B", "B")
df2   <- data.frame(value, type)

mylist <- list(df1, df2)

我认为与length(unique())max()结合使用会有以下几点,但我无法理解。

calculation <- lapply(mylist, function (x) 
{x$#the count of observations of the type that includes the max value#) - (x$#the count of the observations of the type that does not include the max value)})

3 个答案:

答案 0 :(得分:3)

这里的一个技巧是看你的计算可以简化:

[number in group] - [number not in group]
= [number in group] - ([number of rows] - [number in group])
= [number in group] - [number of rows] + [number in group]
= 2 * [number in group] - [number of rows]

所以你可以这样做:

lapply(mylist, function(x) {2*sum(x$type==x$type[which.max(x$value)])-nrow(x)})

返回:

[[1]]
[1] -3

[[2]]
[1] 3

希望这有帮助!

答案 1 :(得分:2)

如果你想分解

lapply(mylist, function(x){ 
  x[,"value"] <- as.numeric(x[,"value"])
  MAX_FLAG <- which(x[,"value"] == max(x[,"value"]))[1]
  MAX_FLAG <- x[MAX_FLAG,"type"]
  A <- length(which(x[,"type"] == "A" ))
  B <- length(which(x[,"type"] == "B" ))
  BA <- ifelse( MAX_FLAG == "B",B-A,A-B)
  return(BA)
 }
)

通过一点适应,您可以轻松地拥有超过2组(即:A&amp; B)

希望它会有所帮助,

Gottavianoni

答案 2 :(得分:1)

您还可以使用聚合来计算每组中的观察数量:

calculations <- lapply(mylist, function(df) {
  sum_df <- aggregate(value~type, df, FUN = length)
  max_type <- df$type[which.max(df$value)]
  sum_df$value[sum_df$type == max_type] - sum_df$value[sum_df$type != max_type]
})