假设我在第一列中有一个带有几个数字的数据框。我想获取这些数字,将它们用作字符串中的位置,并获取在该位置之前和之后包含2个字符的子字符串。澄清一下,
aggSN <- data.frame(V1=c(5,6,7,8),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA" # <- take this string
aggSN # <- take the numbers in the first column
# V1 V2
# 5 blah
# 6 blah
# 7 blah
# 8 blah
并创建一个类似于
的新列V3aggSN
# V1 V2 V3
# 5 blah SDAFK # <- took the two characters before and after the 5th character
# 6 blah DAFKS # <- took the two characters before and after the 6th character
# 7 blah AFKSD # <- took the two characters before and after the 7th character
# 10 blah SDAFJ # <- took the two characters before and after the 10th character
# 2 blah AJSD # <- here you can see that it the substring cuts off
目前我正在使用for循环,它可以工作,但是在非常大的数据帧和大字符串上花费了大量时间。有没有其他选择?谢谢。
fillvector <- ""
for(j in 1:nrow(aggSN)){fillvector[j] <- substr(gen,aggSN[j,V1]-2,aggSN[j,V1]+2)}
aggSN$V9 <- fillvector
答案 0 :(得分:4)
您可以在不编写循环的情况下使用substring()
aggSN <- data.frame(V1=c(5,6,7,8,2),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA"
with(aggSN, substring(gen, V1-2, V1+2))
# [1] "SDAFK" "DAFKS" "AFKSD" "FKSDA" "AJSD"
所以要添加新列,
aggSN$V3 <- with(aggSN, substring(gen, V1-2, V1+2))
aggSN
# V1 V2 V3
# 1 5 blah SDAFK
# 2 6 blah DAFKS
# 3 7 blah AFKSD
# 4 8 blah FKSDA
# 5 2 blah AJSD
如果你想要更快一些事情,我会用stringi::stri_sub
代替substring()
。
答案 1 :(得分:2)
aggSN$V3 <- sapply(aggSN$V1, function(x) substr(gen, x-2, x+2))
应该这样做。
> aggSN
V1 V2 V3
1 5 blah SDAFK
2 6 blah DAFKS
3 7 blah AFKSD
4 8 blah FKSDA
使用您的不同示例
> aggSN
V1 V2 V3
1 5 blah SDAFK
2 6 blah DAFKS
3 7 blah AFKSD
4 10 blah SDAFJ
5 2 blah AJSD