动态地将多个列连接到由分隔符分隔的单个列

时间:2018-02-16 12:31:32

标签: r

我们在多列中拥有品牌数据,其列名如“ID”,“性别”,“种族”,“国家”,“VAR01”,“VAR02”,“VAR04”,“VAR05”,“VAR06”, “VAR08”, “VAR09”, “VAR13”, “VAR14”, “VAR15”, “VAR18”, “VAR19”。我们的任务是将这些品牌列数据连接到由分隔符(;)分隔的单个列,我们可以使用以下代码进行分隔。

R代码:

dataset<-sapply(dataset,as.character)
# Replace "NA" with blank
dataset[is.na(dataset)] <- ""
dataset<-as.data.frame(dataset)

#Concatenate columns data with semicolon(;)
dataset$Var_All<-paste(dataset$Var01,";",dataset$Var02,";",dataset$Var04,";",dataset$Var05,";",dataset$Var06,";",dataset$Var08,";",dataset$Var09,";",dataset$Var13,";",dataset$Var14,";",dataset$Var15,";",dataset$Var18,";",dataset$Var19)

#Remove blank spaces befor and after semicolon
dataset$Var_All <- gsub(" ; ", ";", dataset$Var_All)
dataset$Var_All <- gsub("; ", ";", dataset$Var_All)
dataset$Var_All <- gsub(" ;", ";", dataset$Var_All)

# Replace multiple semicolons with one semicolon step by step
dataset$Var_All <- gsub(";;;;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;", ";", dataset$Var_All)


#Remove beginning and ending semicolons if any 
dataset$Var_All<-gsub("^;+|;+$", "", dataset$Var_All)

数据集:

ID  Gender Race     Country VAR01   VAR02   VAR04   VAR05   VAR06   VAR08   VAR09   VAR13   VAR14   
1   Male   Indian   India   Brand1  NA      Brand2  NA      Brand3  NA      NA      NA      Brand4  
2   Female Indian   India   NA      NA      NA      NA      NA      NA      NA      NA      NA     
3   Male   Indian   India   NA      Brand2  NA      Brand3  NA      NA      Brand5  NA      NA     
4   Male   Indian   India   Brand1  NA      NA      NA      Brand3  NA      NA      NA      Brand4
5   Female Indian   India   NA      Brand2  NA      Brand4  NA      Brand6  NA      Brand7  NA  

但我的问题是:有没有办法拉动变量名称的子字符串,即VAR,并将这些变量连接为一个变量“VAR_All”。目前我们在paste()函数中手动编写并删除空格,替换多个分号以获得所需的输出。

我们想知道是否有可能以动态方式编写代码,以便它应该计算以“VAR”开头并自动连接到“VAR_All”的变量数。

所需的输出应如下:

ID  VAR_All
1   Brand1;Brand2;Brand3;Brand4
2   
3   Brand2;Brand3;Brand5
4   Brand1;Brand3;Brand4
5   Brand2;Brand4;Brand6;Brand7

感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

像这样使用apply

ix <- grep("^VAR", names(dataset))
Paste <- function(x) paste(na.omit(x), collapse = ";")
data.frame(ID = dataset$ID, VAR_All = apply(dataset[ix], 1, Paste))

,并提供:

  ID                     VAR_All
1  1 Brand1;Brand2;Brand3;Brand4
2  2                            
3  3        Brand2;Brand3;Brand5
4  4        Brand1;Brand3;Brand4
5  5 Brand2;Brand4;Brand6;Brand7

注意

使用的输入数据是可重复的形式:

Lines <- '
ID  Gender Race     Country VAR01   VAR02   VAR04   VAR05   VAR06   VAR08   VAR09   VAR13   VAR14   
1   Male   Indian   India   Brand1  NA      Brand2  NA      Brand3  NA      NA      NA      Brand4  
2   Female Indian   India   NA      NA      NA      NA      NA      NA      NA      NA      NA     
3   Male   Indian   India   NA      Brand2  NA      Brand3  NA      NA      Brand5  NA      NA     
4   Male   Indian   India   Brand1  NA      NA      NA      Brand3  NA      NA      NA      Brand4
5   Female Indian   India   NA      Brand2  NA      Brand4  NA      Brand6  NA      Brand7  NA  '
dataset <- read.table(text = Lines, header = TRUE)