您好,给出了以下数据框
library(tidyverse)
df <- data.frame(READS=rep(c('READa', 'READb', 'READc'),each=3) ,GENE=rep(c('GENEa', 'GENEb', 'GENEc'), each=3), COMMENT=rep(c('CommentA', 'CommentA', 'CommentA'),each=3))
> df
READS GENE COMMENT
1 READa GENEa CommentA
2 READa GENEa CommentA
3 READa GENEa CommentA
4 READb GENEb CommentA
5 READb GENEb CommentA
6 READb GENEb CommentA
7 READc GENEc CommentA
8 READc GENEc CommentA
9 READc GENEc CommentA
我想通过“基因列”从长格式转换为宽格式聚合,以便获得以下内容
GENEa GENEb GENEc
READSa 3 3 3
READSb 3 3 3
我尝试没有成功:
library(tidyverse)
df %>%
group_by(GENE) %>%
select(-COMMENT) %>%
spread(READS)
请注意,原始数据帧很大,因此任何优化的代码都将有所帮助。
感谢您的帮助。
答案 0 :(得分:2)
不太确定如何获得GENEa
和READSb
的3个计数,但是假设您想要该计数,可以尝试以下操作:
library(tidyverse)
df <- tibble(
READS = rep(c("READa", "READb", "READc"), each = 3),
GENE = rep(c("GENEa", "GENEb", "GENEc"), each = 3),
COMMENT = rep(c("CommentA", "CommentA", "CommentA"), each = 3)
)
df
#> # A tibble: 9 x 3
#> READS GENE COMMENT
#> <chr> <chr> <chr>
#> 1 READa GENEa CommentA
#> 2 READa GENEa CommentA
#> 3 READa GENEa CommentA
#> 4 READb GENEb CommentA
#> 5 READb GENEb CommentA
#> 6 READb GENEb CommentA
#> 7 READc GENEc CommentA
#> 8 READc GENEc CommentA
#> 9 READc GENEc CommentA
df %>%
count(READS, GENE) %>%
pivot_wider(
names_from = GENE, values_from = n,
values_fill = list(n = 0)
)
#> # A tibble: 3 x 4
#> READS GENEa GENEb GENEc
#> <chr> <int> <int> <int>
#> 1 READa 3 0 0
#> 2 READb 0 3 0
#> 3 READc 0 0 3
由reprex package(v0.3.0)于2019-12-13创建
答案 1 :(得分:2)
假设您希望每个输出单元格中的数字是输入中具有该单元格的行和列名称的行数,那么这是基数R中的单行代码。
table(df[1:2])
提供此table
类对象:
GENE
READS GENEa GENEb GENEc
READa 3 0 0
READb 0 3 0
READc 0 0 3
如果要将结果作为数据框,则:
as.data.frame.matrix(table(df[1:2]))
答案 2 :(得分:1)
library(tidyr) #v1.0.0
pivot_wider(df, -COMMENT, names_from = GENE, values_from = GENE,
values_fn = list(GENE = length), values_fill = list(GENE=0))
# A tibble: 3 x 4
READS GENEa GENEb GENEc
<fct> <int> <int> <int>
1 READa 3 0 0
2 READb 0 3 0
3 READc 0 0 3
答案 3 :(得分:1)
带有dcast
library(data.table)
dcast(setDT(df), READS ~ GENE, length)
# READS GENEa GENEb GENEc
#1: READa 3 0 0
#2: READb 0 3 0
#3: READc 0 0 3
答案 4 :(得分:0)
鉴于您所需输出的某些组合不存在:
df <- data.frame(READS=rep(c('READa', 'READb', 'READc'),each=3) ,GENE=rep(c('GENEa', 'GENEb', 'GENEc'), each=3), COMMENT=rep(c('CommentA', 'CommentA', 'CommentA'),each=3))
df %>%
group_by(READS, GENE) %>%
summarise(count = n()) %>%
spread(key = "GENE", value = "count")
会导致
READS GENEa GENEb GENEc
1 READa 3 NA NA
2 READb NA 3 NA
3 READc NA NA 3
请注意,不建议使用传播工具,在新版本中,您应该使用pivot_wider。