我正在尝试分析一个关于库引用交互的大型,草率,编码不良的数据文件。这是一组数据,用于捕捉我正在努力做的事情:
# assemble data
record<-c(2883823,2883824,2883825,2883826,2883828,2884074,2884076,2884660,2885106,2885222,2885703,2885709)
desk<-c("RRSS","RRSS","RRSS","RRSS","RRSS","RRSS","RRSS","Virt","RRSS","Virt","Virt","RRSS")
inperson<-c("InPerson<5Minutes",NA,NA,"InPerson<5Minutes",NA,NA,"InPerson<5Minutes",NA,"InPerson5-15Minutes",NA,NA,"InPerson15-30minutes")
phone<-c(NA,"Phone5-15Minutes","Phone<5Minutes",NA,NA,"Phone<5Minutes",NA,NA,NA,NA,NA,NA)
chat<-c(NA,NA,NA,NA,"Chat<5Minutes",NA,NA,"Chat5-15Minutes",NA,"Chat5-15Minutes","Chat15-30minutes",NA)
reference<-data.frame(record,desk,inperson,phone,chat) #create data frame
我想在人员,电话和聊天中变量的不同级别进行编码(为了清楚起见,可能使用新名称,我在下面使用前缀Num来表示这一点)字符串为数字。我认为这将是某种if-then语句(但是因为输入数据中使用的语言用不同的语言编码,每个变量都是不同的):
record desk Numperson Numphone Numchat
2883823 RRSS 1 0 0
2883824 RRSS 0 2 0
2883825 RRSS 0 1 0
2883826 RRSS 1 0 0
2883828 RRSS 0 0 1
2884074 RRSS 0 1 0
2884076 RRSS 1 0 0
2884660 Virt 0 0 2
2885106 RRSS 2 0 0
2885222 Virt 0 0 2
2885703 Virt 0 0 3
2885709 RRSS 3 0 0
然后重新排列它以便更适合分析,如下所示:
record desk type Numlevel
2883823 RRSS person 1
2883824 RRSS phone 2
2883825 RRSS phone 1
2883826 RRSS person 1
2883828 RRSS chat 1
2884074 RRSS phone 1
2884076 RRSS person 1
2884660 Virt chat 2
2885106 RRSS person 2
2885222 Virt chat 2
2885703 Virt chat 3
2885709 RRSS person 3
任何帮助,或指向我应该看的地方的指针,作为初学者,对于答案将不胜感激。
答案 0 :(得分:3)
也许是这样的:
#clean up
reference$inperson <- gsub("InPerson|[Mm]inutes", "", reference$inperson)
reference$phone <- gsub("Phone|[Mm]inutes", "", reference$phone)
reference$chat <- gsub("Chat|[Mm]inutes", "", reference$chat)
#reshape to long format
library(reshape2)
reference <- melt(reference, id.vars = c("record", "desk"),
variable.name = "type", value.name = "Numlevel",
na.rm = TRUE)
#match
reference$Numlevel <- match(reference$Numlevel, c("<5", "5-15", "15-30"))
# record desk type Numlevel
#1 2883823 RRSS inperson 1
#4 2883826 RRSS inperson 1
#7 2884076 RRSS inperson 1
#9 2885106 RRSS inperson 2
#12 2885709 RRSS inperson 3
#14 2883824 RRSS phone 2
#15 2883825 RRSS phone 1
#18 2884074 RRSS phone 1
#29 2883828 RRSS chat 1
#32 2884660 Virt chat 2
#34 2885222 Virt chat 2
#35 2885703 Virt chat 3