我有一系列字符串如下:
x <- " 20 to 80% of the sward should be between 3 and 10cm tall,
with 20 to 80% of the sward between 10 and 30cm tall"
我想提取数值并保留单位,我尝试了以下内容:
x <- lapply(x, function(x){gsub("[^\\d |cm\\b |mm\\b |% ]", "", x, perl = T)})
给出了:
" 20 80% 3 10cm 20 80% 10 30cm "
我需要的是:
"20 80%" "3 10cm" "20 80%" "10 30cm"
感谢您阅读
答案 0 :(得分:3)
我们可以使用str_extract_all
中的library(stringr)
来提取与模式匹配的元素(根据@PierreLafortune的评论进行修改)
library(stringr)
lst <- str_extract_all(x, '\\d+\\S*')
如果list
元素的长度相同,我们可以rbind
创建matrix
。
m1 <- do.call(rbind, lst)
paste
交替列在一起
v1 <- paste(m1[,c(TRUE, FALSE)], m1[,c(FALSE, TRUE)])
并将其转换回matrix
。
dim(v1) <- c(nrow(m1), ncol(m1)/2)
v1
# [,1] [,2] [,3] [,4]
#[1,] "20 80%" "3 10cm" "20 80%" "10 30cm"
答案 1 :(得分:0)
不是特别优雅但是......
library(magrittr)
library(stringr)
library(dplyr)
library(plyr)
" 20 80% 3 10cm 20 80% 10 30cm " %>%
str_split(" ") %>%
unlist %>%
as.data.frame %>%
plyr::rename(replace = c("." = "string")) %$%
gsub(string, replacement = "", pattern = " ") %>%
as.data.frame %>%
plyr::rename(replace = c("." = "string")) %>%
filter(string != "") -> etc_etc