我正在尝试提取以下文本中“高”关键字旁边的数字值(带粗体的项目)。但是我遇到了
错误“ stri_extract_first_regex(字符串,模式,opts_regex = opts(pattern))中的错误: 后向模式匹配必须具有有界的最大长度。 (U_REGEX_LOOK_BEHIND_LIMIT)”
我使用的正则表达式是
"(?<=High\\s*>?=?\\s?)[\\d\\.]+[\\s\\-\\d\\.]+(?=\\s)").
这在在线正则表达式测试器中有效,但是当我在Rstudio中执行相同操作时,出现上述错误
文本为
Optimal <2.6 Desirable 2.6 - 3.3 Borderline high 3.4 - 4.0 High ***4.1 - 4.8*** Very high >=4.9
Desirable <5.2 Borderline high 5.2 - 6.1 High >= ***6.2***
Desirable <1.7 Borderline High 1.7 - 2.2 High ***2.3 - 4.4*** Very high >=4.5
请注意,我在R语言中使用了双斜杠。但是在这里,它只显示一个斜杠
你能帮我吗?
答案 0 :(得分:0)
样本数据
我将一个“ Borderline High”更改为“ Borderline high”。可能是错字。
v <- c("Optimal <2.6 Desirable 2.6 - 3.3 Borderline high 3.4 - 4.0 High 4.1 - 4.8 Very high >=4.9",
"Desirable <5.2 Borderline high 5.2 - 6.1 High >= 6.2",
"Desirable <1.7 Borderline high 1.7 - 2.2 High 2.3 - 4.4 Very high >=4.5")
代码
library(dplyr)
library(stringr)
data.frame( text = v, stringsAsFactors = FALSE ) %>%
#Extract text between "High" and "Very", trim whirespace
dplyr::mutate( High = trimws( stringr::str_extract(text, "(?<=High).*(?=Very)") ) ) %>%
#If no text was extracted, take everything after "High" until the end
dplyr::mutate( High = ifelse( is.na( High ), trimws( stringr::str_extract(text, "(?<=High).*(?=$)") ), High ) ) %>%
dplyr::select( High )
输出
# High
# 1 4.1 - 4.8
# 2 >= 6.2
# 3 2.3 - 4.4
如果High
前没有{strong>不是,则仅取High
之后的值。
[a-zA-Z]
data.frame( text = v, stringsAsFactors = FALSE ) %>% #Extract text between "High" and "Very", trim whirespace dplyr::mutate( High = trimws( stringr::str_extract(text, "(?<=[^a-zA-Z] High).*(?=Very)") ) ) %>% #If no text was extracted, take everything after "High" until the end dplyr::mutate( High = ifelse( is.na( High ), trimws( stringr::str_extract(text, "(?<=[^a-zA-Z] High).*(?=$)") ), High ) ) %>% dplyr::select( High )