使用R

时间:2016-10-28 12:34:04

标签: r regex

我在R中有两个关于正则表达式的相关问题:
[1]
我想将包含标点符号后跟字母的子字符串转换为大写字母 示例:

Dr_dre to: DrDre  
Captain.Spock to: CaptainSpock  
spider-man to: spiderMan  

[2]
我想将驼峰大小写字符串转换为带有下划线分隔符的小写字符串 例如:

EndOfFile to: End_of_file  
CamelCase to: Camel_Case  
ABC to: A_B_C  

非常感谢,
Kamashay

1 个答案:

答案 0 :(得分:2)

我们可以使用sub。我们匹配一个或多个标点符号([[:punct:]]+),后跟一个作为一组((.))捕获的字符。在替换中,捕获组(\\1)的反向引用更改为大写(\\U)。

sub("[[:punct:]]+(.)", "\\U\\1", str1, perl = TRUE)
#[1] "DrDre"        "CaptainSpock" "spiderMan"   

对于第二种情况,我们使用正则表达式外观,即匹配一个字母((?<=[A-Za-z]))后跟一个大写字母,并替换为_

gsub("(?<=[A-Za-z])(?=[A-Z])", "_", str2, perl = TRUE)
#[1] "End_Of_File" "Camel_Case"  "A_B_C"  

数据

str1 <- c("Dr_dre", "Captain.Spock", "spider-man")
str2 <- c("EndOfFile", "CamelCase", "ABC")