我正在尝试找出如何根据列值的第一个和最后一个实例生成新列的方法。我的数据如下:
DF <- structure(list(CHR = c(1, 1, 1, 1, 1, 1),
SNP = c("rs2494631", "rs4648637", "rs2494627", "rs11122119", "rs1844583","rs2292242"),
BP = c(2399149, 2401364, 2402499, 6768856, 8383469, 8385059),
KBdist= c(NA, 2215, 1135, 4366357, 1614613, 1590),
locus = c(1, 1, 1, 2, 3, 3)),
.Names = c("CHR","SNP","BP","KBdist","locus"),
row.names = c(NA, 6L),
class = "data.frame")
> df
CHR SNP BP KBdist locus
1 rs2494631 2399149 NA 1
1 rs4648637 2401364 2215 1
1 rs2494627 2402499 1135 1
1 rs11122119 6768856 4366357 2
1 rs1844583 8383469 1614613 3
1 rs2292242 8385059 1590 3
我要实现的目标是:“如果位置相同,则在该位置的第一个实例中使BP与BP相同,而在该位置的最后一个实例中使BP与BP相同”。这将产生一个输出,看起来像这样:
CHR SNP BP KBdist locus start stop
1 rs2494631 2399149 NA 1 2399149 2402499
1 rs4648637 2401364 2215 1 2399149 2402499
1 rs2494627 2402499 1135 1 2399149 2402499
1 rs11122119 6768856 4366357 2 6768856 6768856
1 rs1844583 8383469 1614613 3 8383469 8385059
1 rs2292242 8385059 1590 3 8383469 8385059
我一直在研究我提出的类似问题的答案: Combining an ifelse statement with shift data.table function in R
并具有R中data.table的移位功能,但无济于事。任何帮助将不胜感激!
谢谢。
答案 0 :(得分:1)
您可以使用dplyr
来完成它:
library(dplyr)
dat %>%
group_by(locus) %>%
mutate(start = first(BP),
stop = last(BP))
给出:
## A tibble: 6 x 7
## Groups: locus [3]
# CHR SNP BP KBdist locus start stop
# <int> <fct> <int> <int> <int> <int> <int>
#1 1 rs2494631 2399149 NA 1 2399149 2402499
#2 1 rs4648637 2401364 2215 1 2399149 2402499
#3 1 rs2494627 2402499 1135 1 2399149 2402499
#4 1 rs11122119 6768856 4366357 2 6768856 6768856
#5 1 rs1844583 8383469 1614613 3 8383469 8385059
#6 1 rs2292242 8385059 1590 3 8383469 8385059
数据:
dat <- read.table(header = TRUE,
text = "
CHR SNP BP KBdist locus
1 rs2494631 2399149 NA 1
1 rs4648637 2401364 2215 1
1 rs2494627 2402499 1135 1
1 rs11122119 6768856 4366357 2
1 rs1844583 8383469 1614613 3
1 rs2292242 8385059 1590 3")