我需要从文本中提取一些数字。文字是
x <- "Lorem ipsum dolor sit amet[245], consectetur adipiscing (325). Deinde prima illa, quae in congressu[232]. solemus: Quid tu, inquit, huc? Sequitur disserendi ratio cognitioque 295. naturae;"
要提取的数字是325和232.这些数字在括号内和句子末尾。其他数字不包括在内。我试过了strsplit(text, "[A-Za-z]+")
,但没有得到我需要的东西。
答案 0 :(得分:5)
这是stringi
方法
x <- "Lorem ipsum dolor sit amet[245], consectetur adipiscing (325). Deinde prima illa, quae in congressu[232]. solemus: Quid tu, inquit, huc? Sequitur disserendi ratio cognitioque 295. naturae; Claudii libidini, qui tum erat summo ne imperio, dederetur"
library(stringi)
stri_extract_all_regex(x, "(?<=[\\[(])\\d+(?=[\\])][.?!])")
## [[1]]
## [1] "325" "232"
答案 1 :(得分:4)
另一个:
r <- gregexpr("[[(]\\d+[])](?=\\.)", text, perl = TRUE)
(m <- regmatches(text, r)[[1]])
# [1] "(325)" "[232]"
as.integer(gsub("\\D", "", m))
# [1] 325 232
答案 2 :(得分:3)
以下是使用strsplit
....
> x <- 'Lorem ipsum dolor sit amet[245], consectetur adipiscing (325). Deinde prima illa, quae in congressu[232]. solemus: Quid tu, inquit, huc? Sequitur disserendi ratio cognitioque 295. naturae;'
> strsplit(x, '[^0-9]+')[[1]][3:4]
## [1] "325" "232"
或使用基数R来提取这些值。
> regmatches(x, gregexpr('[[(]\\K\\d+(?=[])](?!,))', x, perl=T))[[1]]
## [1] "325" "232"
答案 3 :(得分:0)
使用re模块
import re
string="Lorem ipsum dolor sit amet[245], consectetur adipiscing (325). Deinde prima illa, quae in congressu[232]. solemus: Quid tu, inquit, huc? Sequitur disserendi ratio cognitioque 295. naturae;"
print string
pattern = re.compile(r'(?<=[\[(])\d+(?=[\])]\.)')
result = pattern.findall(string)
print result