用字符串中的单位提取数字

时间:2015-09-05 10:39:35

标签: r gsub strsplit

我有一系列字符串如下:

x <- " 20 to 80% of the sward should be between 3 and 10cm tall, 
with 20 to 80% of the sward between 10 and 30cm tall"

我想提取数值并保留单位,我尝试了以下内容:

x <- lapply(x, function(x){gsub("[^\\d |cm\\b |mm\\b |% ]", "", x, perl = T)})

给出了:

" 20  80%       3  10cm   20  80%     10  30cm "

我需要的是:

"20 80%" "3 10cm" "20 80%" "10 30cm" 

感谢您阅读

2 个答案:

答案 0 :(得分:3)

我们可以使用str_extract_all中的library(stringr)来提取与模式匹配的元素(根据@PierreLafortune的评论进行修改)

library(stringr)
lst <-  str_extract_all(x, '\\d+\\S*')

如果list元素的长度相同,我们可以rbind创建matrix

m1 <- do.call(rbind, lst)

paste交替列在一起

v1 <- paste(m1[,c(TRUE, FALSE)], m1[,c(FALSE, TRUE)])

并将其转换回matrix

dim(v1) <- c(nrow(m1), ncol(m1)/2)
v1
#     [,1]     [,2]     [,3]     [,4]     
#[1,] "20 80%" "3 10cm" "20 80%" "10 30cm"

答案 1 :(得分:0)

不是特别优雅但是......

library(magrittr)
library(stringr)
library(dplyr)
library(plyr)
" 20  80%       3  10cm   20  80%     10  30cm " %>%
str_split(" ") %>%
unlist %>% 
as.data.frame %>% 
    plyr::rename(replace = c("." = "string")) %$%
    gsub(string, replacement = "", pattern = " ") %>%
    as.data.frame %>% 
    plyr::rename(replace = c("." = "string")) %>%
    filter(string != "") -> etc_etc