R中特殊字符的条件替换

时间:2017-12-19 08:53:25

标签: r regex gsub

我希望能够获取包含 df $ col 列的数据框 df ,该列包含以下条目:

I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired

并替换字母之间出现的问号,撇号和字符串开头出现的问号没有任何内容:

I'm tired
You're tired
You're tired?
Are you tired?
I am tired

2 个答案:

答案 0 :(得分:2)

我会在开头使用sub来代替其他人使用gsub,因为字符串中的单词之间可能会有几个问号但开头只有一个问号。

gsub("(\\w)\\?(\\w)", "\\1'\\2", sub("^\\?", "", df$col))
[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?"
[5] "I am tired"   

有关解释,请参阅https://regex101.com/r/jClVPg/1

一些解释:

  • 第一捕获组(\\ w):

    \\ w匹配任何单词字符(等于[a-zA-Z0-9 _])

  • \\?匹配角色?字面意思(区分大小写)

  • 第二捕获小组(\\ w):

    \\ w匹配任何单词字符(等于[a-zA-Z0-9 _])

答案 1 :(得分:0)

我们可以使用sub

df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?" "I am tired"    

这里我们假设示例中显示了一个?。否则,只需将内部sub替换为gsub

即可

数据

df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?", 
"Are you tired?", "?I am tired")), .Names = "col", 
 class = "data.frame", row.names = c(NA, -5L))