从R中的字符串中提取文本并存储在变量中

时间:2015-07-23 06:53:50

标签: r

我有一个像这样的角色矢量:

> filenames
[1] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv"
[2] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 40 b - 11.csv"
[3] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/30 v 60 b - 12.csv"
[4] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/5 v 10 b - 6.csv" 
[5] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 7.csv" 
[6] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 20 b - 8.csv" 
[7] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 30 b - 9.csv" 
[8] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 10.csv"  
[9] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 11.csv"  
[10] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 12.csv"  
[11] "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 6.csv"      

我想在vb之前提取值并将它们存储在变量中。让我解释一下

filenames[1]开始,我希望获得'20'之前的v'40'之前的b并将其存储在变量中{{1} }

我想为每个r[1] = 20/40以及包含filenames[i]的文件名执行此操作我想写'cont. v'r[8] = 10。此处r[9] = 10是预定义值

请帮我解决这个问题。

2 个答案:

答案 0 :(得分:1)

您可以尝试

 library(stringr)
 indx <- grepl('cont', filenames)
 lst <- str_extract_all(filenames[!indx], '(\\d+)(?=\\s+(v|b))')
 v1 <-  sapply(lst, function(x) as.numeric(x[1])/as.numeric(x[2]))

 v2 <- as.numeric(str_extract(filenames[indx], '\\d+(?=\\.csv)'))
 r <- numeric(length(filenames))
 r[indx] <- v2
 r[!indx] <- v1
 r
 #[1]  0.5000000  0.2500000  0.5000000  0.5000000  0.5000000  1.0000000
 #[7]  0.3333333 10.0000000 11.0000000 12.0000000  0.5000000

数据

filenames <- c("C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 40 b - 11.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/30 v 60 b - 12.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/5 v 10 b - 6.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 7.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 20 b - 8.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/10 v 30 b - 9.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 10.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 11.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/cont. v - 12.csv", 
"C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 6.csv"
)

答案 1 :(得分:0)

?regexp的帮助一样:

parse.one <- function(res, result) {
  m <- do.call(rbind, lapply(seq_along(res), function(i) {
    if(result[i] == -1) return("")
    st <- attr(result, "capture.start")[i, ]
    substring(res[i], st, st + attr(result, "capture.length")[i, ] - 1)
  }))
  colnames(m) <- attr(result, "capture.names")
  m
}

filenames <- c("C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/20 v 40 b - 10.csv",
               "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/22 v 44 b - 10.csv",
               "C:/Users/USER/Desktop/Magnetic field vs. vacuum level/Data/223 v 5 b - 10.csv")
regex <- '.*/(?<v>[0-9]+)\\ v\\ (?<b>[0-9]+)\\ b.*'
parsed <- regexpr(regex,filenames, perl=TRUE)
parse.one(filenames, parsed)

parse.one函数只需要定义一次。