我希望能够获取包含 df $ col 列的数据框 df ,该列包含以下条目:
I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired
并替换字母之间出现的问号,撇号和字符串开头出现的问号没有任何内容:
I'm tired
You're tired
You're tired?
Are you tired?
I am tired
答案 0 :(得分:2)
我会在开头使用sub
来代替其他人使用gsub
,因为字符串中的单词之间可能会有几个问号但开头只有一个问号。
gsub("(\\w)\\?(\\w)", "\\1'\\2", sub("^\\?", "", df$col))
[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?"
[5] "I am tired"
有关解释,请参阅https://regex101.com/r/jClVPg/1。
一些解释:
第一捕获组(\\ w):
\\ w匹配任何单词字符(等于[a-zA-Z0-9 _])
\\?匹配角色?字面意思(区分大小写)
第二捕获小组(\\ w):
\\ w匹配任何单词字符(等于[a-zA-Z0-9 _])
答案 1 :(得分:0)
我们可以使用sub
df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?" "I am tired"
这里我们假设示例中显示了一个?
。否则,只需将内部sub
替换为gsub
df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?",
"Are you tired?", "?I am tired")), .Names = "col",
class = "data.frame", row.names = c(NA, -5L))