Question

我希望能够获取包含 df $ col 列的数据框 df ，该列包含以下条目：

I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired

并替换字母之间出现的问号，撇号和字符串开头出现的问号没有任何内容：

I'm tired
You're tired
You're tired?
Are you tired?
I am tired

Answer 1

我会在开头使用sub来代替其他人使用gsub，因为字符串中的单词之间可能会有几个问号但开头只有一个问号。

gsub("(\\w)\\?(\\w)", "\\1'\\2", sub("^\\?", "", df$col))
[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?"
[5] "I am tired"

有关解释，请参阅https://regex101.com/r/jClVPg/1。

一些解释：

第一捕获组（\\ w）：

\\ w匹配任何单词字符（等于[a-zA-Z0-9 _]）
\\？匹配角色？字面意思（区分大小写）
第二捕获小组（\\ w）：

\\ w匹配任何单词字符（等于[a-zA-Z0-9 _]）

Answer 2

我们可以使用sub

df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired"      "You're tired"   "You're tired?"  "Are you tired?" "I am tired"

这里我们假设示例中显示了一个?。否则，只需将内部sub替换为gsub

即可

数据

df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?", 
"Are you tired?", "?I am tired")), .Names = "col", 
 class = "data.frame", row.names = c(NA, -5L))

R中特殊字符的条件替换

2 个答案:

数据