使用dplyr
group_by
和summarise
函数时,我遇到了非常令人沮丧的问题。
这是我的数据集:
> cum_ems_totals
Source: local data frame [12 x 4]
Chamber Total_emmissions Treatment Block
<fctr> <dbl> <fctr> <fctr>
1 1 5769.0507 U 1
2 3 7790.1426 IU 1
3 4 5166.8992 AN 1
4 5 7625.7319 AN 2
5 6 1964.0970 IU 2
6 7 5052.1268 U 2
7 9 4207.5324 IU 3
8 10 470.7014 AN 3
9 12 5675.9171 U 3
10 14 5666.1678 U 4
11 15 2134.5002 AN 4
12 16 4093.4687 IU 4
> str(cum_ems_totals)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 12 obs. of 4 variables:
$ Chamber : Factor w/ 13 levels "1","3","4","5",..: 1 2 3 4 5 6 7 8 9 11 ...
$ Total_emmissions: num [1:101, 1] 5769 7790 5167 7626 1964 ...
$ Treatment : Factor w/ 4 levels "U","IU","AN",..: 1 2 3 3 2 1 2 3 1 1 ...
$ Block : Factor w/ 5 levels "1","2","3","13",..: 1 1 1 2 2 2 3 3 3 5 ...
我现在想通过治疗来计算一些摘要统计数据:
cum_ems_summary <- cum_ems_totals %>% filter(Chamber != "10") %>%
group_by(Treatment) %>%
summarise(n = n(), Mean = mean(Total_emmissions, na.rm = TRUE),
SD = sd(Total_emmissions, na.rm = TRUE), SEM = SD/sqrt(n))
这给了我:
> cum_ems_summary
Source: local data frame [3 x 5]
Treatment n Mean SD SEM
<fctr> <int> <dbl> <dbl> <dbl>
1 U 4 5540.816 329.0763 164.5381
2 IU 4 4513.810 2415.6355 1207.8178
3 AN 3 4975.710 2750.6038 1588.0618
到目前为止一切顺利。但是,如果我尝试使用ggplot
对此数据进行绘图,则会出现以下错误:
> ggplot(cum_ems_summary, aes(x = Treatment, y = Mean, fill = Treatment)) + geom_bar(stat = "identity")
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 3, 11
数据框的str
给出了这个:
> str(cum_ems_summary)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 5 variables:
$ Treatment: Factor w/ 4 levels "U","IU","AN",..: 1 2 3
$ n : int 4 4 3
$ Mean :
Error in str.default(obj, ...) :
dims [product 11] do not match the length of object [3]
我不明白这里发生了什么!任何人都可以帮忙吗?
答案 0 :(得分:3)
#Reproduce error
str(cum_ems_summary)
# Error in str.default(obj, ...) :
# dims [product 11] do not match the length of object [3]
#Fix
cum_ems_totals$Total_emmissions <- c(cum_ems_totals$Total_emmissions)
#Try again
cum_ems_summary <- cum_ems_totals %>% filter(Chamber != "10") %>%
group_by(Treatment) %>%
summarise(n = n(), Mean = mean(Total_emmissions, na.rm = TRUE),
SD = sd(Total_emmissions, na.rm = TRUE), SEM = SD/sqrt(n))
ggplot(cum_ems_summary, aes(x = Treatment, y = Mean, fill = Treatment)) + geom_bar(stat = "identity")
答案 1 :(得分:1)
我刚遇到同样的问题并通过在最后添加mutate_if来解决它,以防它有用:
df2<- df%>%
group_by(group) %>%
mutate_each(funs(scale, mean)) %>%
mutate_if(is.matrix, as.vector)
答案 2 :(得分:-1)
错误信息可能与您的治疗有4个级别的事实无关吗?当它应该有3个级别&#34; U&#34;,&#34; IU&#34;,&#34; AN&#34;分配的级别为1,2,3和额外级别&#34; ..&#34;没有分配号码。