R - 拆分字符向量,以便将每个唯一元素添加到新的字符向量中

时间:2016-01-21 15:26:00

标签: regex r vector strsplit

我有一个字符向量,其中单个元素包含由逗号分隔的多个字符串。我通过从数据框中提取它来获得此列表,它看起来像这样:

 [1] "Acworth, Crescent Lake, East Acworth, Lynn, South Acworth"                                                                              
 [2] "Ferncroft, Passaconaway, Paugus Mill"                                                                                                   
 [3] "Alexandria, South Alexandria"                                                                                                           
 [4] "Allenstown, Blodgett, Kenison Corner, Suncook (part)"                                                                                   
 [5] "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow"                                                                 
 [6] "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands"
 [7] "Amherst, Baboosic Lake, Cricket Corner, Ponemah"                                                                                        
 [8] "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover"                                                        
 [9] "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch"                                                                    
[10] "Ashland" 

我想获得一个新的字符向量,其中每个字符串都是该字符向量中的元素,即:

 [1] "Acworth", "Crescent Lake", "East Acworth", "Lynn", "South Acworth"                                                                              
 [6] "Ferncroft", "Passaconaway", "Paugus Mill", "Alexandria", "South Alexandria"

我使用了strsplit()函数,但这会返回一个列表。当我尝试将其转换为字符向量时,它将恢复为旧状态。

我确信这是一个非常简单的问题 - 任何帮助都将不胜感激!谢谢!

3 个答案:

答案 0 :(得分:4)

您可以删除空格并使用"\\s*,\\s*"正则表达式分割字符向量,然后unlist结果:

v <- c("Acworth, Crescent Lake, East Acworth, Lynn, South Acworth", "Ferncroft, Passaconaway, Paugus Mill", "Alexandria, South Alexandria",  "Allenstown, Blodgett, Kenison Corner, Suncook (part)", "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow", "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands", "Amherst, Baboosic Lake, Cricket Corner, Ponemah",  "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover",  "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch",  "Ashland" )
s <- unlist(strsplit(v, "\\s*,\\s*"))

请参阅IDEONE demo

正则表达式匹配\s*两侧的零个或多个空白符号(,),从而修剪值。这将处理案件,即使有一个&#34; wild&#34;空格之前初始字符向量中的逗号。

答案 1 :(得分:2)

你的帖子标题表明你想要独特的字符串,所以

unique(unlist(strsplit(myvec, split=",")))

unique(unlist(strsplit(myvec, split=", ")))

如果逗号后面总是有空格。

答案 2 :(得分:1)

作为替代方案,您也可以使用scan,如下所示:

unique(scan(what = "", text = v, sep = ",", strip.white = TRUE))

strip.white = TRUE部分会处理您可能拥有的任何前导或尾随空格。

注意:“v”来自this other answer