Question

我正在搜索如何使用通配符作为语料库一部分的删除标准的一部分。我无法在SO或谷歌上找到与此问题相关的任何内容。

目的：分析标准化笔记的大型数据集，其中员工输入被分解为文本的各个部分。

示例数据：

***Date; Area: asdfwerqw Detail: xxxxx Requested Action: xxxxxx Assigned to: John Doe

提取分析的部分：

Detail:xxxxx Requested Action:xxxxxx

细节之前的项目数可能更多。此外，分配给：可能不会出现。

Answer 1

如果没有更多的例子和细节，很难说，但你可能想要使用具有正向前瞻和可选项的正则表达式：

library(stringr)

text <- c("***Date; Area: asdfwerqw Detail: xxxxx Requested Action: xxxxxx Assigned to: John Doe")

str_extract_all(text, c("Detail(.*?)(?=Requested Action:)", "Requested Action:((.*?)(?=Assigned to:))?"))

# [[1]]
# [1] "Detail: xxxxx "
# 
# [[2]]
# [1] "Requested Action: xxxxxx "

使用通配符删除短语

1 个答案: