如何在字符串列表之间放置空格?

时间:2019-03-16 14:52:31

标签: r regex string

这是我当前的数据集:

c("Jetstar","Qantas", "QantasLink","RegionalExpress","TigerairAustralia", 
   "VirginAustralia","VirginAustraliaRegionalAirlines","AllAirlines", 
   "Qantas-allQFdesignatedservices","VirginAustralia-allVAdesignatedservices")

我想在航空公司名称之间添加一个空格,并用空格分隔。

为此,我尝试了以下代码:

airlines$airline <- gsub("([[:lower:]]) ([[:upper:]])", "\\1 \\2", airlines$airline)

但是我得到的文本格式与以前相同。

我想要的输出如下:

enter image description here

3 个答案:

答案 0 :(得分:3)

txt <- c("Jetstar","Qantas", "QantasLink","RegionalExpress","TigerairAustralia", 
"VirginAustralia","VirginAustraliaRegionalAirlines","AllAirlines", 
"Qantas-allQFdesignatedservices","VirginAustralia-allVAdesignatedservices")

您需要两种不同的规则:一种用于大小写更改之前的空格,另一种用于重复出现的单词(“指定”,“服务”)或符号(“-”)。您可以从一个模式开始,该模式先识别一个小写字符,然后识别一个大写字符(由“ [AZ]”这样的字符类识别),然后在两个捕获类的两个字符之间插入一个空格(在节的两侧带有括号)模式)。有关字符类和捕获类的快速描述,请参见?regex详细信息部分:

gsub("([a-z])([A-Z])", "\\1 \\2", txt)

然后,您将该结果用作自变量,在您也要分隔的文本中任何重复出现的单词之前添加一个空格:

gsub("(-|all|designated|services)", " \\1", # second pattern and sub for "specials"
gsub("([a-z])([A-Z])", "\\1 \\2", txt))  #first pattern and sub for case changes

 [1] "Jetstar"                                      
 [2] "Qantas"                                       
 [3] "Qantas Link"                                  
 [4] "Regional Express"                             
 [5] "Tigerair Australia"                           
 [6] "Virgin Australia"                             
 [7] "Virgin Australia Regional Airlines"           
 [8] "All Airlines"                                 
 [9] "Qantas - all QF designated services"          
[10] "Virgin Australia - all VA designated services"

我看到有人赞成我对Splitting CamelCase in R的早期回答,该回答与此类似,但是这个问题还有很多皱纹可以解决。

答案 1 :(得分:1)

(几乎)可以做到这一点

gsub("([A-Z])", " \\1", airlines) 

借来于:splitting-camelcase-in-r

当然,诸如Qantas-allQFd…之类的名称仍然会出现问题,因为该字符串的第二部分中有两个连续的UpperCase字母(“ QF”)。

答案 2 :(得分:1)

我试图弄清楚,我想出了一些办法:

library(stringr)

data_vec<- c("Jetstar","Qantas", "QantasLink","RegionalExpress","TigerairAustralia", 
  "VirginAustralia","VirginAustraliaRegionalAirlines","AllAirlines", 
  "Qantas-allQFdesignatedservices","VirginAustralia-allVAdesignatedservices")


str_trim(gsub("(?<=[A-Z]{2})([a-z]{1})", " \\1",gsub("([A-Z]{1,2})", " \\1", data_vec)))

我希望这会有所帮助。