我有一个包含一列的数据框,我想根据第一列中的某些条件创建另一列。这是我到目前为止编写的脚本,它可以正常工作,但它很慢,因为它有大约5万行。
data <- read.table("~/Documents/git_repos/Aspen/Reference_genome/Potrs01-genome_mod_id.txt")
> dim(data) # [1] 509744 1
> head(data)
V1
1 Potrs000004
2 Potrs000004
3 Potrs000004
4 Potrs000004
5 Potrs000004
6 Potrs000004
test <- paste("Potrs00000", seq(000001,10000,by=1), sep ="")
length(test) # [1] 10000
> head(test)
[1] "Potrs000001" "Potrs000002" "Potrs000003" "Potrs000004" "Potrs000005"
[6] "Potrs000006"
test.m <- matrix("NA", nrow = 509744, ncol = 2 )
dim(test.m) # [1] 509744 2
> head(test.m)
[,1] [,2]
[1,] "NA" "NA"
[2,] "NA" "NA"
[3,] "NA" "NA"
[4,] "NA" "NA"
[5,] "NA" "NA"
[6,] "NA" "NA"
for (i in test) {
for (j in data$V1) {
if (i == j)
test.m[,1] = j
test.m[,2] = "chr9"
}
}
test.d <- as.data.frame(test.m)
> head(test.d)
V1 V2
1 Potrs000004 chr9
2 Potrs000004 chr9
3 Potrs000004 chr9
4 Potrs000004 chr9
5 Potrs000004 chr9
6 Potrs000004 chr9
有没有办法修改代码以加快速度?
答案 0 :(得分:2)
您似乎希望V1
的{{1}}值与data
中的元素匹配。
我会使用test
执行此操作:
data.table
请注意,结果已经是library(data.table)
setDT(data)
data[,.(V1[V1 %in% test], "chr9")]
(也是data.table
)
data.frame