我们在多列中拥有品牌数据,其列名如“ID”,“性别”,“种族”,“国家”,“VAR01”,“VAR02”,“VAR04”,“VAR05”,“VAR06”, “VAR08”, “VAR09”, “VAR13”, “VAR14”, “VAR15”, “VAR18”, “VAR19”。我们的任务是将这些品牌列数据连接到由分隔符(;)分隔的单个列,我们可以使用以下代码进行分隔。
R代码:
dataset<-sapply(dataset,as.character)
# Replace "NA" with blank
dataset[is.na(dataset)] <- ""
dataset<-as.data.frame(dataset)
#Concatenate columns data with semicolon(;)
dataset$Var_All<-paste(dataset$Var01,";",dataset$Var02,";",dataset$Var04,";",dataset$Var05,";",dataset$Var06,";",dataset$Var08,";",dataset$Var09,";",dataset$Var13,";",dataset$Var14,";",dataset$Var15,";",dataset$Var18,";",dataset$Var19)
#Remove blank spaces befor and after semicolon
dataset$Var_All <- gsub(" ; ", ";", dataset$Var_All)
dataset$Var_All <- gsub("; ", ";", dataset$Var_All)
dataset$Var_All <- gsub(" ;", ";", dataset$Var_All)
# Replace multiple semicolons with one semicolon step by step
dataset$Var_All <- gsub(";;;;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;;", ";", dataset$Var_All)
dataset$Var_All <- gsub(";;", ";", dataset$Var_All)
#Remove beginning and ending semicolons if any
dataset$Var_All<-gsub("^;+|;+$", "", dataset$Var_All)
数据集:
ID Gender Race Country VAR01 VAR02 VAR04 VAR05 VAR06 VAR08 VAR09 VAR13 VAR14
1 Male Indian India Brand1 NA Brand2 NA Brand3 NA NA NA Brand4
2 Female Indian India NA NA NA NA NA NA NA NA NA
3 Male Indian India NA Brand2 NA Brand3 NA NA Brand5 NA NA
4 Male Indian India Brand1 NA NA NA Brand3 NA NA NA Brand4
5 Female Indian India NA Brand2 NA Brand4 NA Brand6 NA Brand7 NA
但我的问题是:有没有办法拉动变量名称的子字符串,即VAR,并将这些变量连接为一个变量“VAR_All”。目前我们在paste()函数中手动编写并删除空格,替换多个分号以获得所需的输出。
我们想知道是否有可能以动态方式编写代码,以便它应该计算以“VAR”开头并自动连接到“VAR_All”的变量数。
所需的输出应如下:
ID VAR_All
1 Brand1;Brand2;Brand3;Brand4
2
3 Brand2;Brand3;Brand5
4 Brand1;Brand3;Brand4
5 Brand2;Brand4;Brand6;Brand7
感谢您的帮助。
答案 0 :(得分:0)
像这样使用apply
:
ix <- grep("^VAR", names(dataset))
Paste <- function(x) paste(na.omit(x), collapse = ";")
data.frame(ID = dataset$ID, VAR_All = apply(dataset[ix], 1, Paste))
,并提供:
ID VAR_All
1 1 Brand1;Brand2;Brand3;Brand4
2 2
3 3 Brand2;Brand3;Brand5
4 4 Brand1;Brand3;Brand4
5 5 Brand2;Brand4;Brand6;Brand7
使用的输入数据是可重复的形式:
Lines <- '
ID Gender Race Country VAR01 VAR02 VAR04 VAR05 VAR06 VAR08 VAR09 VAR13 VAR14
1 Male Indian India Brand1 NA Brand2 NA Brand3 NA NA NA Brand4
2 Female Indian India NA NA NA NA NA NA NA NA NA
3 Male Indian India NA Brand2 NA Brand3 NA NA Brand5 NA NA
4 Male Indian India Brand1 NA NA NA Brand3 NA NA NA Brand4
5 Female Indian India NA Brand2 NA Brand4 NA Brand6 NA Brand7 NA '
dataset <- read.table(text = Lines, header = TRUE)