Question

我遇到的问题应该很容易解决。我想替换以模式开头的字符串中的整个单词。

cudaDeviceSynchronize

到目前为止，我遇到的最好的是

kernel_A

我真的没有想法了。

我也很满意

> test <- "i really wasn aware and i wasnt aware at all. but i wasn't aware. just wasn't."

    ## this is what i want
    > output
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't."

编辑：看来我的问题有点太具体了。所以，我正在添加其他测试用例。基本上，我不知道会跟随什么字符＆＃34; wasn＆＃34;我想将所有人转换为不是

# this is what get, but it's not correct
> gsub("\\<wasn*.\\>", "wasn't", test)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't't aware. Just wasn't't."

Answer 1

您可以使用perl提供的负面展望.. pattern=wasn(?!')t*

gsub("wasn(?!')t*","wasn't",test,perl=T)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't."

或者你可以这样做：

gsub("wasn'*t*","wasn't",test)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't."

对于第二个期望的输出：

gsub("wasn'*t*[.]?","wasn't",test)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't"

编辑后：

gsub("wasn[^. ]*","wasn't",test)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't. this wasn't meant to be. it wasn't simple"

Answer 2

我建议这样的解决方案：

test <- c("i really wasn aware and i wasnt aware at all. but i wasn't aware. just wasn't. this wasn45'e meant to be. it wasn@'re simple", "Wasn&^$tt that nice?", "You say wasnmmmt?", "No, he wasn&#t#@$.", "She wasn%#@t##, I know.")
 gsub("\\b(wasn)\\S*\\b(?:\\S*(\\p{P})\\B)?", "\\1't\\2", test, ignore.case=TRUE, perl=TRUE)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't. this wasn't meant to be. it wasn't simple"
[2] "Wasn't that nice?"                                                                                                          
[3] "You say wasn't?"                                                                                                            
[4] "No, he wasn't."                                                                                                             
[5] "She wasn't, I know."

查看online R demo。

此解决方案考虑了wasn*出现在字符串开头或大写时的情况，并且不会替换尾随标点符号。

模式详情

\\b - 字边界
(wasn) - 捕获第1组（稍后在替换模式中使用\\1引用）：wasn子字符串（由于ignore.case=TRUE而不区分大小写）
\\S*\\b - 除了空格之外的任何0 +字符，后跟字边界
(?:\\S*(\\p{P})\\B)? - 一个可选的非捕获组，匹配1或0次出现
- \\S* - 0+非空白字符
- (\\p{P}) - 捕获第2组（稍后在替换模式中使用\\2）：任何1个标点符号（不是符号！\p{P}不等于[:punct:] ！）符号后面没有......
- \\B - 一个字母，数字或_（它是一个非字边界模式）。

对于更混乱的字符串（如She wasn%#@t##,$#^ I know.），当标点符号可以位于其他标点符号中时，您可以使用自定义括号表达式并在\S*处添加gsub("\\b(wasn)\\S*\\b(?:\\S*([?!.,:;])\\S*)?", "\\1't\\2", test, ignore.case=TRUE, perl=TRUE)来限制要停止的标点符号结束：

{{1}}

请参阅regex demo。

Answer 3

为什么不保持简单并将wasn替换为wasn't的任何单词替换为<{1}}？

test2 <- paste0(
  "i really wasn aware and i wasnt aware at all. but i wasn't aware. just",
  "wasn't. this wasn45'e meant to be. it wasn@'re simple"
)
gsub("wasn[^ ]*", "wasn't", test2)
[1] "i really wasn't aware and i wasn't aware at all. but i wasn't aware. just wasn't this wasn't meant to be. it wasn't simple"

如果处理大写，那么你可以将ignore.case = TRUE添加到gsub（）。

使用R中的gsub替换以模式开头的整个单词

3 个答案: