字符串就是这样。
test <- c("John got a score of 4.5 in mathematics and scored 4.3 in English and ranked 4th.", "Matthew got a score of 7.6")
所需输出为c(8.8,7.6)。
“得分”模式后的数字基本总和。
我试过了:
s <- as.numeric(gsub(pattern = "^\\D*score\\D*(\\d+\\.*\\d*)\\D*score*\\D*(\\d*\\.*\\d*)\\D*$", replacement = ("\\1"), test$Purpose)) +
as.numeric(gsub(pattern = "^\\D*score\\D*(\\d+\\.*\\d*)\\D*score*\\D*(\\d*\\.*\\d*)\\D*$", replacement = ("\\2"), test$Purpose))
然而,这是返回NAs。
答案 0 :(得分:2)
我们可以使用正则表达式提取数字,然后执行sum
library(stringr)
sapply(str_extract_all(test, "(?<=score of )[0-9.]+|(?<=scored )[0-9.]+"),
function(x) sum(as.numeric(x)))
#[1] 8.8 7.6
或使用tidyverse
library(dplyr)
library(purrr)
str_extract_all(test, "\\b[0-9.]+\\b") %>%
map_dbl(~ as.numeric(.x) %>%
sum)
#[1] 8.8 7.6
或者,如果我们只需要获取score
str_extract_all(test, "(?<=score of )[0-9.]+|(?<=scored )[0-9.]+") %>%
map_dbl(~ as.numeric(.x) %>%
sum)