Question

假设我在第一列中有一个带有几个数字的数据框。我想获取这些数字，将它们用作字符串中的位置，并获取在该位置之前和之后包含2个字符的子字符串。澄清一下，

aggSN <- data.frame(V1=c(5,6,7,8),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA"  # <- take this string
aggSN                            # <- take the numbers in the first column
# V1    V2
#  5  blah
#  6  blah
#  7  blah
#  8  blah

并创建一个类似于

的新列V3

aggSN                           
# V1    V2    V3
#  5  blah SDAFK   # <- took the two characters before and after the 5th character
#  6  blah DAFKS   # <- took the two characters before and after the 6th character 
#  7  blah AFKSD   # <- took the two characters before and after the 7th character 
# 10  blah SDAFJ   # <- took the two characters before and after the 10th character 
#  2  blah AJSD   # <- here you can see that it the substring cuts off

目前我正在使用for循环，它可以工作，但是在非常大的数据帧和大字符串上花费了大量时间。有没有其他选择？谢谢。

fillvector <- ""
for(j in 1:nrow(aggSN)){fillvector[j] <- substr(gen,aggSN[j,V1]-2,aggSN[j,V1]+2)}
aggSN$V9 <- fillvector

Answer 1

您可以在不编写循环的情况下使用substring()

aggSN <- data.frame(V1=c(5,6,7,8,2),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA" 

with(aggSN, substring(gen, V1-2, V1+2))
# [1] "SDAFK" "DAFKS" "AFKSD" "FKSDA" "AJSD"

所以要添加新列，

aggSN$V3 <- with(aggSN, substring(gen, V1-2, V1+2))
aggSN
#   V1   V2    V3
# 1  5 blah SDAFK
# 2  6 blah DAFKS
# 3  7 blah AFKSD
# 4  8 blah FKSDA
# 5  2 blah  AJSD

如果你想要更快一些事情，我会用stringi::stri_sub代替substring()。

Answer 2

aggSN$V3 <- sapply(aggSN$V1, function(x) substr(gen, x-2, x+2))

应该这样做。

> aggSN
  V1   V2    V3
1  5 blah SDAFK
2  6 blah DAFKS
3  7 blah AFKSD
4  8 blah FKSDA

使用您的不同示例

> aggSN
  V1   V2    V3
1  5 blah SDAFK
2  6 blah DAFKS
3  7 blah AFKSD
4 10 blah SDAFJ
5  2 blah  AJSD

R创建没有for循环的新列

2 个答案: