计算中位数似乎是a bit of an achilles heel for R(即no data.frame method)。使用dplyr从数据框中获取组中位所需的最少打字量是什么?
my_data <- structure(list(group = c("Group 1", "Group 1", "Group 1", "Group 1",
"Group 1", "Group 1", "Group 1", "Group 1", "Group 1", "Group 1",
"Group 1", "Group 1", "Group 1", "Group 1", "Group 1", "Group 2",
"Group 2", "Group 2", "Group 2", "Group 2", "Group 2", "Group 2",
"Group 2", "Group 2", "Group 2", "Group 2", "Group 2", "Group 2",
"Group 2", "Group 2"), value = c("5", "3", "6", "8", "10", "13",
"1", "4", "18", "4", "7", "9", "14", "15", "17", "7", "3", "9",
"10", "33", "15", "18", "6", "20", "30", NA, NA, NA, NA, NA)), .Names = c("group",
"value"), class = c("tbl_df", "data.frame"), row.names = c(NA,
-30L))
library(dplyr)
# groups 1 & 2
my_data_groups_1_and_2 <- my_data[my_data$group %in% c("Group 1", "Group 2"), ]
# compute medians per group
medians <- my_data_groups_1_and_2 %>%
group_by(group) %>%
summarize(the_medians = median(value, na.rm = TRUE))
给出了:
Error in summarise_impl(.data, dots) :
STRING_ELT() can only be applied to a 'character vector', not a 'double'
In addition: Warning message:
In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
argument is not numeric or logical: returning NA
这里解决方法的最小努力是什么?
答案 0 :(得分:1)
由ivyleavedtoadflax评论,错误是由median
提供非数字或非逻辑参数引起的,因为value
列的类型为character
(您可以轻松通过查看引号来告诉他们不是numeric
。以下是解决问题的两种简单方法:
my_data %>%
filter(group %in% c("Group 1", "Group 2")) %>%
group_by(group) %>%
summarize(the_medians = median(as.numeric(value), na.rm = TRUE))
或
my_data %>%
filter(group %in% c("Group 1", "Group 2")) %>%
mutate(value = as.numeric(value)) %>%
group_by(group) %>%
summarize(the_medians = median(value, na.rm = TRUE))
要检查数据中包含type
列的结构,您可以方便地使用
str(my_data)
#Classes ‘tbl_df’ and 'data.frame': 30 obs. of 2 variables:
# $ group: chr "Group 1" "Group 1" "Group 1" "Group 1" ...
# $ value: chr "5" "3" "6" "8" ...