如何按组获取准确性值

时间:2019-06-11 12:47:28

标签: r aggregate summary

在分组图表类型和条件的Correct_answers列中,我无法获得平均准确度( TRUE值的比例)。

数据

structure(list(Element = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6"), class = "factor"), Correct_answer = structure(c(2L, 
2L, 2L, 1L, 2L), .Label = c("FALSE", "TRUE"), class = "factor"), 
    Response_time = c(25.155, 6.74, 28.649, 16.112, 105.5906238
    ), Chart_type = structure(c(2L, 2L, 1L, 1L, 1L), .Label = c("Box", 
    "Violin"), class = "factor"), Condition = structure(c(1L, 
    2L, 1L, 2L, 1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA, 
5L), class = "data.frame")

平均按图表类型

av_data_chartType <- data %>% group_by(Chart_type) %>% summarise_each(funs(mean, sd))

按条件平均

av_data_conition <- data %>% group_by(Condition) %>% summarise_each(funs(mean, sd))

没有产生准确度的平均值

NA值是应该保持精度的地方。

3 个答案:

答案 0 :(得分:3)

复制您的代码时,我有一个警告使我得到答案:您不应该计算因子变量的统计信息。如果您知道自己在做什么,可以将它们转换为数字:

data <- structure(list(Element = structure(c(1L, 1L, 1L, 1L, 1L), 
                                         .Label = c("1", "2", "3", "4", "5", "6"), 
                                         class = "factor"), 
                     Correct_answer = structure(c(2L, 2L, 2L, 1L, 2L), 
                                                .Label = c("FALSE", "TRUE"), 
                                                class = "factor"), 
                     Response_time = c(25.155, 6.74, 28.649, 16.112, 105.5906238
                     ), 
                     Chart_type = structure(c(2L, 2L, 1L, 1L, 1L), 
                                            .Label = c("Box", 
                                                       "Violin"), 
                                            class = "factor"), 
                     Condition = structure(c(1L, 2L, 1L, 2L, 1L), 
                                           .Label = c("0", "1"), 
                                           class = "factor")),
                row.names = c(NA, 5L), class = "data.frame")

library("dplyr", warn.conflicts = FALSE)
data <- data %>% as_tibble

# av_data_chartType 
data %>% 
        group_by(Chart_type) %>%
        mutate_if(.predicate = is.factor, .funs = as.numeric) %>% 
        summarise_each(list( ~mean, ~sd))
#> `mutate_if()` ignored the following grouping variables:
#> Column `Chart_type`
#> # A tibble: 2 x 9
#>   Chart_type Element_mean Correct_answer_~ Response_time_m~ Condition_mean
#>   <fct>             <dbl>            <dbl>            <dbl>          <dbl>
#> 1 Box                   1             1.67             50.1           1.33
#> 2 Violin                1             2                15.9           1.5 
#> # ... with 4 more variables: Element_sd <dbl>, Correct_answer_sd <dbl>,
#> #   Response_time_sd <dbl>, Condition_sd <dbl>

# av_data_condition
data %>% 
        group_by(Condition) %>%
        mutate_if(.predicate = is.factor, .funs = as.numeric) %>% 
        summarise_each(list( ~mean, ~sd))
#> `mutate_if()` ignored the following grouping variables:
#> Column `Condition`
#> # A tibble: 2 x 9
#>   Condition Element_mean Correct_answer_~ Response_time_m~ Chart_type_mean
#>   <fct>            <dbl>            <dbl>            <dbl>           <dbl>
#> 1 0                    1              2               53.1            1.33
#> 2 1                    1              1.5             11.4            1.5 
#> # ... with 4 more variables: Element_sd <dbl>, Correct_answer_sd <dbl>,
#> #   Response_time_sd <dbl>, Chart_type_sd <dbl>

reprex package(v0.2.1)于2019-06-11创建

答案 1 :(得分:2)

这应该有效:

a$Correct_answer <- as.logical(a$Correct_answer)

av_data_chartType <- a %>% select(Chart_type, Correct_answer) %>% group_by(Chart_type) %>% summarise_each(funs(mean, sd))

av_data_chartType <- a %>% select(Condition, Correct_answer) %>% group_by(Condition) %>% summarise_each(funs(mean, sd))

您有2个问题:

  1. 您的Correct_answer是一个因素。

  2. 您尝试计算每个列上的函数

答案 2 :(得分:2)

您可能需要

df.loc[(df['A'] == 'Test')].assign(C=1, D=df['B'] * wert)

或者如果您单独需要它们

library(dplyr)

data %>%
  mutate(Correct_answer = as.logical(Correct_answer)) %>%
  group_by(Chart_type, Condition) %>%
  summarise(avg = mean(Correct_answer))