Question

我有以下数据框：

IV      Device1     Device2    Device3
Color   Same        Same       Missing
Color   Different   Same       Missing
Color   Same        Unique     Missing
Shape   Same        Missing    Same
Shape   Different   Same       Different

解释：每个IV（独立变量）由多个测量组成（'颜色'部分由3个不同的测量组成，而'形状'由2个组成）。

每个数据点都有4个可能的分类值之一：相同/不同/唯一/缺失。 “缺失”表示在该设备的情况下该测量没有值，而其他3个值表示该测量的现有结果。

问题：我想为每个设备计算它具有相同/不同/唯一值的次数（从而生成3个不同的百分比），超出该值的总数IV（不包括存在“缺失”值的情况）。

例如，设备2将具有以下百分比：

颜色 - 67％相同，0％不同，33％独特。
形状 - 100％相同，0％不同，0％独特。

谢谢！

Answer 1

快速而肮脏：首先，替换你的“失踪”＆＃39; by＆＃39; NA＆＃39;使用您的首选方法（sed，excel等），然后您可以使用每个列上的表来获取摘要统计信息：

myStats <- function(x){
    table(factor(x, levels = c('Same', 'Different', 'Unique')))/sum(table(x))
}    
apply(yourData, 2, myStats)

这将返回您想要的摘要。

Answer 2

这不是 TIDY 解决方案，但您可以使用此解决方案直到其他人发布更好的解决方案：

null

<强>输出

Final是两个数据框的列表：

# Replace all "Missing" with NAs
df[df == "Missing"] <- NA


# Create factor levels
df[,-1] <- lapply(df[,-1], function(x) {
        factor(x, levels = c('Same', 'Different', 'Unique'))
})


# Custom function to calculate percent of categorical responses
custom <- function(x) {
        y <- length(na.omit(x))
        if(y > 0) 
                return(round((table(x)/y)*100))
        else
                return(rep(0, 3))
}


library(purrr)

# Split the dataframe on IV, remove the IV column and apply the custom function
Final <- df %>% split(df$IV) %>% 
    map(., function(x) {
      x <- x[, -1]
      t(sapply(x, custom))
    })

数据

$Color Same Different Unique Device1 67 33 0 Device2 67 0 33 Device3 0 0 0 $Shape Same Different Unique Device1 50 50 0 Device2 100 0 0 Device3 50 50 0

计算R中分类响应的百分比（使用分组）

2 个答案: