Question

我想在数据集中找到前两个表达的同种型：

ID         position   count
Gene1,       1300,       200
Gene1,      1400,        54
Gene1,       4500,      178
Gene1,       230,        450
Gene2,       4580,       80
Gene2,       549,        740
Gene2,       84,         199

结果应如下所示：

ID,        position1,   p1-count,   position2,  p2-count
Gene1,       230,          450,       1300,        200
Gene2,       84,           199,       549,         740

感谢您的帮助。

Answer 1

我也在研究生物数据。所以我知道你想要什么。

d1 <- read.table(text="ID position count
Gene1 1300 200
Gene1 1400 54
Gene1 4500 178
Gene1 230 450
Gene2 4580 80
Gene2 549 740
Gene2 84 199", head=T, as.is=T)

library(dplyr)

d2 <- d1 %>% group_by(ID) %>% arrange(desc(count)) %>%
  do(head(., 2)) %>% group_by(ID)\

d2
# Source: local data frame [4 x 3]
# Groups: ID [2]

#      ID position count
#   (chr)    (int) (int)
# 1 Gene1      230   450
# 2 Gene1     1300   200
# 3 Gene2      549   740
# 4 Gene2       84   199

我认为d2的结构非常好。无论如何，要获得所需的结果，请使用cbind。

cbind(d2[seq(1, nrow(d2), by=2), ], d2[seq(2, nrow(d2), by=2), -1])
#      ID position count position count
# 1 Gene1      230   450     1300   200
# 2 Gene2      549   740       84   199

任何人都可以帮我写R脚本给我最重要的两个表达基因吗？

1 个答案: