我有一个关于如何将多列转换为矢量的问题。我有以下数据集,我想根据它们的条件对它们进行分组,并将所有位置计数放入一个向量中。我知道我可以使用as.vector()来单独转换它们,但我想知道是否有dplyr方式。谢谢!
test -> structure(list(gene_id = c("gene0", "gene0", "gene0", "gene0",
"gene0", "gene0", "gene0", "gene0", "gene0", "gene0", "gene0",
"gene0", "gene0", "gene0", "gene0", "gene0", "gene0", "gene0",
"gene0", "gene0", "gene0", "gene0", "gene0", "gene0"), codon_index = c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L), position_1_count = c(2L, 7L, 8L,
0L, 2L, 22L, 19L, 15L, 134L, 1L, 127L, 30L, 0L, 0L, 1L, 4L, 65L,
234L, 1L, 3L, 57L, 0L, 4L, 16L), position_2_count = c(0L, 5L,
5L, 0L, 3L, 2L, 3L, 13L, 134L, 0L, 36L, 5L, 0L, 0L, 0L, 1L, 150L,
7L, 0L, 7L, 7L, 0L, 6L, 1L), position_3_count = c(0L, 2L, 1L,
0L, 4L, 0L, 3L, 32L, 43L, 3L, 9L, 1L, 0L, 0L, 0L, 4L, 105L, 1L,
0L, 14L, 5L, 0L, 6L, 1L), condition = structure(c(1L, 1L, 1L,
7L, 7L, 7L, 3L, 3L, 3L, 5L, 5L, 5L, 8L, 8L, 8L, 2L, 2L, 2L, 4L,
4L, 4L, 6L, 6L, 6L), .Label = c("c", "cup", "n", "nup", "p",
"pup", "min", "rich"), class = "factor")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -24L), .Names = c("gene_id",
"codon_index", "position_1_count", "position_2_count", "position_3_count",
"condition"))
> head(a)
# A tibble: 6 × 6
gene_id codon_index position_1_count position_2_count position_3_count condition
<chr> <int> <int> <int> <int> <fctr>
1 gene0 1 2 0 0 c
2 gene0 2 7 5 2 c
3 gene0 3 8 5 1 c
4 gene0 1 0 0 0 min
5 gene0 2 2 3 4 min
6 gene0 3 22 2 0 min
我们如何将此数据集转换为(我在此处未添加列名称)
2 0 0 7 5 2 8 5 1 c
0 0 0 2 3 4 22 2 0 min
答案 0 :(得分:2)
另一种选择:
library(purrr)
test %>%
slice_rows("condition") %>%
by_slice(function(x) unlist(x[-(1:2)]), .to = "vec")
给出了:
# condition vec
#1 c 2, 7, 8, 0, 5, 5, 0, 2, 1
#2 cup 4, 65, 234, 1, 150, 7, 4, 105, 1
#3 n 19, 15, 134, 3, 13, 134, 3, 32, 43
#4 nup 1, 3, 57, 0, 7, 7, 0, 14, 5
#5 p 1, 127, 30, 0, 36, 5, 3, 9, 1
#6 pup 0, 4, 16, 0, 6, 1, 0, 6, 1
#7 min 0, 2, 22, 0, 3, 2, 0, 4, 0
#8 rich 0, 0, 1, 0, 0, 0, 0, 0, 0
如@advance的评论所述,如果你想要行结果:
test %>%
slice_rows("condition") %>%
by_slice(function(x) as.vector(t(x[-(1:2)])), .to = "vec")
# condition vec
#1 c 2, 0, 0, 7, 5, 2, 8, 5, 1
#2 cup 4, 1, 4, 65, 150, 105, 234, 7, 1
#3 n 19, 3, 3, 15, 13, 32, 134, 134, 43
#4 nup 1, 0, 0, 3, 7, 14, 57, 7, 5
#5 p 1, 0, 3, 127, 36, 9, 30, 5, 1
#6 pup 0, 0, 0, 4, 6, 6, 16, 1, 1
#7 min 0, 0, 0, 2, 3, 4, 22, 2, 0
#8 rich 0, 0, 0, 0, 0, 0, 1, 0, 0
或使用do()
代替summarise()
调整@ DavidArenburg的评论:
test %>%
group_by(condition) %>%
select(position_1_count:condition) %>%
do(res = c(t(.[,-4])))
给出了:
# condition res
#1 c 2, 0, 0, 7, 5, 2, 8, 5, 1
#2 cup 4, 1, 4, 65, 150, 105, 234, 7, 1
#3 n 19, 3, 3, 15, 13, 32, 134, 134, 43
#4 nup 1, 0, 0, 3, 7, 14, 57, 7, 5
#5 p 1, 0, 3, 127, 36, 9, 30, 5, 1
#6 pup 0, 0, 0, 4, 6, 6, 16, 1, 1
#7 min 0, 0, 0, 2, 3, 4, 22, 2, 0
#8 rich 0, 0, 0, 0, 0, 0, 1, 0, 0
答案 1 :(得分:1)
我是否更正你想要的是每个条件的所有计数的单独向量?如果是这样,dplyr
和tidyr
的组合就应该这样做。首先,我gather
将所有计数放在一列中。然后,split
按条件分隔,然后使用lapply
生成一个列表,其中包含每个条件的单独向量:
a %>%
gather(Location, Count, starts_with("position")) %>%
split(.$condition) %>%
lapply(function(x){x$Count})
给出:
$c
[1] 2 7 8 0 5 5 0 2 1
$cup
[1] 4 65 234 1 150 7 4 105 1
$n
[1] 19 15 134 3 13 134 3 32 43
$nup
[1] 1 3 57 0 7 7 0 14 5
$p
[1] 1 127 30 0 36 5 3 9 1
$pup
[1] 0 4 16 0 6 1 0 6 1
$min
[1] 0 2 22 0 3 2 0 4 0
$rich
[1] 0 0 1 0 0 0 0 0 0
如果订单很重要(上面的错误),您应该能够在拆分之前进行排序,例如在arrange(codon_index)
gather
答案 2 :(得分:1)
在接受彼得森的想法之后,我认为这段代码效果最好:
test %>% gather(Location, Count, starts_with("position")) %>% arrange(codon_index) %>% group_by(condition) %>% do(count = as.vector(t(.$Count)))
结果将如下所示
> ans = test %>% gather(Location, Count, starts_with("position")) %>% arrange(codon_index) %>% group_by(condition) %>% do(count = as.vector(t(.$Count)))
# A tibble: 8 × 2
condition count
* <fctr> <list>
1 c <int [9]>
2 cup <int [9]>
3 n <int [9]>
4 nup <int [9]>
5 p <int [9]>
6 pup <int [9]>
7 min <int [9]>
8 rich <int [9]>
> ans$count[[1]]
[1] 2 0 0 7 5 2 8 5 1
非常感谢大家的帮助!