我面对的一些代码高尔夫,并且苦苦挣扎。我坚持使用长格式的复杂数据集,我需要进行广泛的分析。我设法轻松转换。但是,由于数据的填充方式,转换后数据集中存在冗余。所以这是一个MWE,我面临的问题是:
id <- c("ana","ana","ana", "brad","ana","brad","brad","brad", "matt", "matt", "matt")
hour <- c(0, 0, 24, 0, 48, 24, NA, 72, 0 , 24, 48 )
assessment <- c("memory", "memory", "attention", "verbal", "attention", "memory", "attention","attention", "memory", "attention", "attention")
value <- c(0.000,NA,0.895,0.000,15.000, 3, 5, NA,2, 4,5 )
mydata<-data.frame(id, hour, assessment, value)
结果:
> mydata
id hour assessment value
1 ana 0 memory 0.000
2 ana 0 memory NA
3 ana 24 attention 0.895
4 brad 0 verbal 0.000
5 ana 48 attention 15.000
6 brad 24 memory 3.000
7 brad NA attention 5.000
8 brad 72 attention NA
9 matt 0 memory 2.000
10 matt 24 attention 4.000
11 matt 48 attention 5.000
之后:
library(dplyr)
library(tidyr)
mydata %>%
group_by(id) %>%
mutate(i1=row_number()) %>%
spread(assessment, value)
到达:
Source: local data frame [11 x 6]
Groups: id [3]
id hour i1 attention memory verbal
* <fctr> <dbl> <int> <dbl> <dbl> <dbl>
1 ana 0 1 NA 0 NA
2 ana 0 2 NA NA NA
3 ana 24 3 0.895 NA NA
4 ana 48 4 15.000 NA NA
5 brad 0 1 NA NA 0
6 brad 24 2 NA 3 NA
7 brad 72 4 NA NA NA
8 brad NA 3 5.000 NA NA
9 matt 0 1 NA 2 NA
10 matt 24 2 4.000 NA NA
11 matt 48 3 5.000 NA NA
请注意,ana有两个小时0和内存条目;和布拉德有一个零,另一个缺少。丢失也应该被视为零,这是收集数据的人的输入错误。
下表显示了ana和brad的条目应该如何。应该折叠/合并相同的id和小时(包括NA)的重复(查看下面的第1行和第5行)。
id hour i1 attention memory verbal
* <fctr> <dbl> <int> <dbl> <dbl> <dbl>
1 ana 0 1 NA 0 NA
2 ana 24 3 0.895 NA NA
4 ana 48 4 15.000 NA NA
5 brad 0 1 5.000 NA 0
6 brad 24 2 NA 3 NA
7 brad 72 4 NA NA NA
9 matt 0 1 NA 2 NA
10 matt 24 2 4.000 NA NA
11 matt 48 3 5.000 NA NA
问题:
答案 0 :(得分:1)
一个选项是replace
NA为0,获取distinct
行,然后按照OP的代码进行
mydata %>%
mutate_at(vars(hour, value), funs(replace(., is.na(.), 0))) %>%
arrange(id, hour, desc(value)) %>%
distinct() %>%
group_by(id, hour, assessment) %>%
spread(assessment, value)