Question

我有一个参考文献列表，例如

references <- c(
  "Dumitru, T.A., Smith, D., Chang, E.Z., and Graham, S.A., 2001, Uplift, exhumation, and deformation in the Japanese Mt Everest, Paleozoic and Mesozoic tectonic evolution of central Africa: from continental assembly to intracontinental deformation: Journal of Neverland, v. 3, no. 192, p. 71-199.",
  "Dumitru, T.A., Smith, D., Chang, E.Z., and Graham, S.A., 2001, Uplift, exhumation, and deformation in the Japanese Mt Everest, Paleozoic and Mesozoic tectonic evolution of central Africa: from continental assembly to intracontinental deformation: Journal of Neverland, no. 3.",
  "Dumitru, T.A., Smith, D., Chang, E.Z., and Graham, S.A., 2001, Uplift, exhumation, and deformation in the Japanese Mt Everest, Paleozoic and Mesozoic tectonic evolution of central Africa: from continental assembly to intracontinental deformation: Journal of Neverland, p. 71-199."
)

我尝试过(?<=:)(?.*)(?=(v\.)|(no\.)|(p\.))，但是正则表达式从大陆组装返回到洲内变形：《梦幻岛日记》，第3版，否。第192页。不是我要提取的内容。

(?<=:)(?:[^:].*?)(?=(, v\.)|(, no\.)|(, p\.))

我期望的是《梦幻岛日记》，但回报是“从大陆组装到大陆内部变形：梦幻岛日记”

Answer 1

在这里，我们只将最后一个冒号之前的文本与捕获组中的下一个逗号匹配

stringr::str_match(references, ": ((?!:)[^,:]*),")[,2]
# [1] "Journal of Neverland" "Journal of Neverland" "Journal of Neverland"

Answer 2

您可以使用

:\s*\K[^:]*?(?=,\s*(?:v|no|p)\.)

请参见regex demo

详细信息

:-冒号
\s*-超过0个空格
\K-匹配重置运算符
[^:]*?-除:以外的零个或多个字符，但尽可能少的*?是非贪婪的
(?=,\s*(?:v|no|p)\.)-正向超前，需要先进行,，然后是0+空格，然后是v，no或p，后跟{{ 1}}紧接在当前位置的右侧。

在R中：

请参见R demo online：

regmatches(references, regexpr(":\\s*\\K[^:]*?(?=,\\s*(?:v|no|p)\\.)", references, perl=TRUE))

如果您希望使用基于references <- c( "Dumitru, T.A., Smith, D., Chang, E.Z., and Graham, S.A., 2001, Uplift, exhumation, and deformation in the Japanese Mt Everest, Paleozoic and Mesozoic tectonic evolution of central Africa: from continental assembly to intracontinental deformation: Journal of Neverland, v. 3, no. 192, p. 71-199.", "Dumitru, T.A., Smith, D., Chang, E.Z., and Graham, S.A., 2001, Uplift, exhumation, and deformation in the Japanese Mt Everest, Paleozoic and Mesozoic tectonic evolution of central Africa: from continental assembly to intracontinental deformation: Journal of Neverland, no. 3.", "Dumitru, T.A., Smith, D., Chang, E.Z., and Graham, S.A., 2001, Uplift, exhumation, and deformation in the Japanese Mt Everest, Paleozoic and Mesozoic tectonic evolution of central Africa: from continental assembly to intracontinental deformation: Journal of Neverland, p. 71-199." ) regmatches(references, regexpr(":\\s*\\K[^:]*?(?=,\\s*(?:v|no|p)\\.)", references, perl=TRUE)) ## => [1] "Journal of Neverland" "Journal of Neverland" "Journal of Neverland"的解决方案，请使用其中一种

stringr

或者，如果> str_extract(references, "(?<=:\\s)[^:]*?(?=,\\s*(?:v|no|p)\\.)") [1] "Journal of Neverland" "Journal of Neverland" "Journal of Neverland"之后的空白可以为0或很多：

Answer 3

这是一个gsub解决方案

gsub('.*: (.*?), (?=v|no|p).*','\\1', references, perl=TRUE)
# [1] "Journal of Neverland" "Journal of Neverland" "Journal of Neverland"

或者，也可以使用strsplit

vapply(strsplit(references, ': *|, *', perl=TRUE),
       function (l) {
         k <- which(startsWith(l, 'p. ') | startsWith(l, 'v. ') | startsWith(l, 'no. '))
         k <- k[1] - 1
         return (l[k]) 
       }, character (1))
# [1] "Journal of Neverland" "Journal of Neverland" "Journal of Neverland"

如何将参考列表转换为数据框？

3 个答案: