我想了解每个国家/地区的小学,初中和高中的最大持续时间总和(因为每年,持续时间可能不相同)。我首先是group_by国家/地区,并使用colSum,但是我得到的值是全部的max(colSum),这意味着group_bu在这里根本不起作用。
我做了一些研究,并且已经脱离了“ plyr”。实际上,如果我尝试
df1 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(
newvar = sum(wt)
)
效果很好。但是在这里,我不是不是只在一列上进行变异,而是在许多列上进行变异,您知道该怎么做才能解决此问题吗?
非常感谢。
data1 = data.frame(country = c("A",'A',"A",'A',"B","B","B","B"),
item = c("Age for primary school","Duration for primary school", "Duration for middle school", "duration for high school",
"Age for primary school","Duration for primary school", "Duration for middle school", "duration for high school"),
'2008' = c(6, 6, 4, 3,7,5,4,3),
'2009' = c(6, 6, 4, 3,6,6,4,3),
'2010' = c(7, 5, 4, 3,6,6,4,3),
'2011' = c(7, 5, 4, 3,7,5,4,3))
temp1 <- dplyr::filter(data1, item != 'Age for primary school') %>%
dplyr::group_by(country) %>%
dplyr::mutate(n_grade = max(colSums(.[,-c(1:2)], na.rm = TRUE)))
答案 0 :(得分:0)
如果在突变中使用.
,它将占据管道的左侧,即整个data.frame / tibble,而不是单个组。您可以改用do
。
temp1 <- dplyr::filter(data1, item != 'Age for primary school') %>%
dplyr::group_by(country) %>%
dplyr::do(mutate(., n_grade = max(colSums(.[,-c(1:2)], na.rm = TRUE))))
请注意,这就是使用data.table
library(data.table)
setDT(data1)
temp1 <-
data1[item != 'Age for primary school'] %>%
.[, n_grade := max(colSums(.SD, na.rm = TRUE))
, by = country
, .SDcols = -(1:2)]