如果这是一个简单的问题,请道歉。我有整齐(长)格式的数据。我想看看Factor Name
中Sample Name
中每个样本的值集合的差异。# Groups: Sample Name
`Sample Name` `Factor Name` mean
<fct> <fct> <dbl>
1 S1 ABCD -5.15
2 S1 EFGH 7.74
3 S1 IJKL -7.43
4 S2 ABCD 4.35
5 S2 EFGH -2.15
6 S2 IJKL 2.33
7 S3 ABCD 5.53
8 S3 EFGH 2.84
9 S3 IJKL 1.61
10 S3 MNOP NaN
我相信它可以使用group_by函数。
Aggregate(`Factor Name` ~ `Sample Name`, df, FUN= function(x) setdiff(unique(df$`Factor Name`),x))
我也尝试过聚合,虽然它提供了输出,但我更喜欢group_by或管道效率方法。
Factor Name
如果可能,我希望能够为每个示例名称添加缺少的# Groups: Sample Name
`Sample Name` `Factor Name` mean
<fct> <fct> <dbl>
1 S1 ABCD -5.15
2 S1 EFGH 7.74
3 S1 IJKL -7.43
4 S1 MNOP NaN
5 S2 ABCD 4.35
6 S2 EFGH -2.15
7 S2 IJKL 2.33
8 S2 MNOP NaN
9 S3 ABCD 5.53
10 S3 EFGH 2.84
11 S3 IJKL 1.61
12 S3 MNOP NaN
,如下所示:
get-process
答案 0 :(得分:1)
tidyr::expand
和tidyr::compelete
函数可以帮助您实现目标。
加载套餐:
library(dplyr)
library(tidyr)
创建一个虚拟数据集:
df <- data_frame(sample_name = factor(c(rep(c('S1', 'S2', 'S3'), each = 3), 'S3')),
factor_name = factor(c(rep(c('ABCD', 'EFGH', 'IJKL'), 3), 'MNOP')),
mean = rnorm(n = 10, sd = 10))
问题1
为factor_name
中的每个样本获取sample_name
中值集的差异:
# Return ONLY those levels of sample_name that are missing a level of factor_name
df %>%
# Expand to all unique combinations
expand(sample_name, factor_name) %>%
# Extract the difference
setdiff(., select(df, -mean))
#> # A tibble: 2 x 2
#> sample_name factor_name
#> <fct> <fct>
#> 1 S1 MNOP
#> 2 S2 MNOP
# Return ALL levels of sample_name, along with any missing levels of factor_name
df %>%
# Expand to all unique combinations
expand(sample_name, factor_name) %>%
# Extract the difference
setdiff(., select(df, -mean)) %>%
# Expand to show all levels of sample_name
complete(sample_name)
#> # A tibble: 3 x 2
#> sample_name factor_name
#> <fct> <fct>
#> 1 S1 MNOP
#> 2 S2 MNOP
#> 3 S3 <NA>
问题2
为每个factor_name
添加缺少的sample_name
:
# Expand to include ALL levels of factor_name within sample_name
df %>%
complete(sample_name, factor_name)
#> # A tibble: 12 x 3
#> sample_name factor_name mean
#> <fct> <fct> <dbl>
#> 1 S1 ABCD 16.6
#> 2 S1 EFGH -0.0803
#> 3 S1 IJKL 4.80
#> 4 S1 MNOP NA
#> 5 S2 ABCD 3.80
#> 6 S2 EFGH -1.24
#> 7 S2 IJKL 1.50
#> 8 S2 MNOP NA
#> 9 S3 ABCD -5.94
#> 10 S3 EFGH 10.4
#> 11 S3 IJKL -14.3
#> 12 S3 MNOP -6.87
由reprex package(v0.2.0)创建于2018-05-10。