删除文本中特定部分的破折号和冒号

时间:2018-07-11 14:40:54

标签: r

我有与此相似的文本文件。

Section A - Blah blah
Random sentence.
Section B - Hello
Random sentence.
SECTION C - Random sentence
Random sentence.
SECTION D - Hi
Part A - Hey
PART B - howdy
Task 1: Blah
Task 2: Blah

我正在尝试获取:

Section A  Blah blah
Random sentence.
Section B  Hello
Random sentence.
SECTION C  Random sentence
Random sentence.
SECTION D  Hi
Part A  Hey
PART B  howdy
Task 1 Blah
Task 2 Blah

我正在尝试检测文本中的模式,例如“ Section”,不区分大小写,后跟字母或“ Task”,后跟数字,并删除该行中的标点符号。我想知道如何才能尽可能地做到这一点。

1 个答案:

答案 0 :(得分:4)

编辑: :通过在其上添加更多检查来添加解决方案。

fd <- read.table(text="Section A - Blah blah
Random sentence.
Section B - Hello
Random sentence.
SECTION C - Random sentence
Random sentence.
SECTION D - Hi
Part A - Hey
PART B - howdy
Task 1: Blah
Task 2: Blah", header = FALSE)


fd  %>%
gsub("(Section[^-]*)-(.*)","\\1 \\2",.) %>%
gsub("(Task[^:]*):(.*)","\\1 \\2",.)

输出如下。

[1] "Section A   Blah blah\nRandom sentence.\nSection B   Hello\nRandom sentence.\nSECTION C   Random sentence\nRandom sentence.\nSECTION D - Hi\nPart A - Hey\nPART B - howdy\nTask 1  Blah\nTask 2  Blah"


以下内容可能会对您有所帮助。

gsub("-|:","",var)

以下是变量的样本数据。

var <- c("Section A - Blah blah
Random sentence.
Section B - Hello
Random sentence.
SECTION C - Random sentence
Random sentence.
SECTION D - Hi
Part A - Hey
PART B - howdy
Task 1: Blah
Task 2: Blah")