是否有一种通用的方法来删除以R中的非大写字母开头的子字符串?

时间:2019-04-29 03:15:45

标签: r string gsub

尽管这很难用书面描述。我正在尝试找到一种通用的方法:

 [1] "Nature's Corner, Inc.Grocery StoresHerbsBBB Rating: A+"        
 [2] "Peapod Pick-UpGrocery StoresFood Delivery Service"             
 [3] "Stop & ShopGrocery Stores"                                     
 [4] "WegmansGrocery Stores"                                      

对此:

 [1] "Nature's Corner, Inc."        
 [2] "Peapod Pick-Up"             
 [3] "Stop & Shop"                                     
 [4] "Wegmans"  

是否可以使用gsub用正则表达式编写此代码?

1 个答案:

答案 0 :(得分:3)

执行(其中s是您的字符串数组):

gsub(pattern = "([a-z.])[A-Z].*", replacement = "\\1", x = s)

这是什么:查找小写字母或。然后是大写字母。保留第一个字符,然后删除所有后续字符。

结果:

[1] "Nature's Corner, Inc." "Peapod Pick-Up"        "Stop & Shop"           "Wegmans"

(控制台较窄)

[1] "Nature's Corner, Inc."
[2] "Peapod Pick-Up"       
[3] "Stop & Shop"          
[4] "Wegmans"  

替代:

如果要删除的部分始终以“杂货”开头

gsub(pattern = "Grocery.*", replacement = "", x = s)

但这可能会将“妈妈和流行音乐的杂货店” 转换为“妈妈和流行音乐的杂货店”