我想根据分裂点的第二个数字向量将字符向量拆分为子字符串
vec <- "LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
split.points <- c(25, 32, 55, 90, 124)
我想将split.points
向量中给出的位置上面的上述字符向量切割成六个不同的子字符串。
这听起来很简单,但我知道的split
命令只能使用特定的正则表达式(模式)或者使用一定长度的子串。
我将不胜感激。
答案 0 :(得分:6)
我们可以尝试substring
:
substring(
vec,
c(1, split.points + 1),
c(split.points, nchar(vec))
)
# [1] "LAYRVCMTNEGHPWVSLVVQKTRLQ" "ISQDPSL"
# [3] "NYEYLPTMGLKSFIQASLALLFG" "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK"
# [5] "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP" "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
答案 1 :(得分:4)
另一种方法是使用read.fwf
:
unlist(read.fwf(textConnection(vec),
widths = c(25, diff(split.points)),
as.is = TRUE),
use.names = FALSE)
给出:
[1] "LAYRVCMTNEGHPWVSLVVQKTRLQ" [2] "ISQDPSL" [3] "NYEYLPTMGLKSFIQASLALLFG" [4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK" [5] "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP"
当您的角色向量来自数据文件时,我不会感到惊讶。在这种情况下,read.fwf
将特别有用。一个例子:
vec2 <- "LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM
LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
read.fwf(textConnection(vec2),
widths = c(25, diff(split.points)),
as.is=TRUE)
将给出:
V1 V2 V3 V4 V5
1 LAYRVCMTNEGHPWVSLVVQKTRLQ ISQDPSL NYEYLPTMGLKSFIQASLALLFG KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP
2 LAYRVCMTNEGHPWVSLVVQKTRLQ ISQDPSL NYEYLPTMGLKSFIQASLALLFG KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP
答案 2 :(得分:3)
我们可以使用separate
tidyr
library(tidyverse)
data_frame(vec) %>%
separate(vec, into = paste0('vec', 1:6), sep = split.points) %>%
unlist(., use.names = FALSE)
#[1] "LAYRVCMTNEGHPWVSLVVQKTRLQ" "ISQDPSL" "NYEYLPTMGLKSFIQASLALLFG"
#[4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK" "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP"
#[6] "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
base R
选项为substr
unname(mapply(substr, vec, start = c(1, split.points+1), stop = c(split.points, nchar(vec))))
#[1] "LAYRVCMTNEGHPWVSLVVQKTRLQ" "ISQDPSL" "NYEYLPTMGLKSFIQASLALLFG"
#[4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK" "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP" "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"