我有一个非常混乱的数据集,看起来像这样:
Person1
Answer1 10 3
Answer2 4 12 5
Person2
Answer2 12 3 4 19 23
Answer3 3 14 22
我想把它变成这样:
Person1 Person2
Answer1 10
Answer2 4
Answer2 12
Answer2 5
Answer2 12
Answer2 3
Answer2 4
Answer2 19
Answer2 23
Answer3 3
Answer3 14
Answer3 22
我对此完全迷失了。我尝试了以下for循环,以尝试将原始列中的数据提取到干净的数据集中:
for(i in 1:nrow(dat)){
for(j in 2:ncol(dat)){
if(!is.na(dat[i,j])){
dat.clean[i+1,2]<-dat[i,j]
dat.clean[i,1]<-dat[i,1]
}else{}
}
}
但是我要彻底清除垃圾了。任何帮助将不胜感激!
输出:
X1 X2
1 5 NA
2 2 5
3 3 3
4 1 5
5 3 6
6 4 23
7 NA 22
答案 0 :(得分:1)
这有点复杂,但是使用示例数据集(另存为CSV文件即可)。
txt <- readLines("messydata.csv")
txt <- txt[sapply(txt, nchar) != 0]
answer <- NULL
Data <- list()
for(x in txt){
value <- NULL
if(grepl("person", x, ignore.case = TRUE)) {
curr <- unlist(strsplit(x, ","))
curr <- curr[sapply(curr, nchar) != 0]
}
if(grepl("answer", x, ignore.case = TRUE)){
y <- unlist(strsplit(x, ","))
y <- y[sapply(y, nchar) != 0]
answer <- c(answer, rep(y[1], length(y) - 1))
value <- scan(text = y[-1])
Data[[curr]] <- c(Data[[curr]], value)
}
}
n <- length(answer)
s <- 0L
for(i in seq_along(Data)){
d <- length(Data[[i]])
Data[[i]] <- c(rep(NA, s), Data[[i]], rep(NA, n - s - d))
s <- s + d
}
result <- data.frame(Answer = answer, do.call(cbind, Data))
result
# Answer Person1 Person2
#1 Answer1 10 NA
#2 Answer1 3 NA
#3 Answer2 4 NA
#4 Answer2 12 NA
#5 Answer2 5 NA
#6 Answer2 NA 12
#7 Answer2 NA 3
#8 Answer2 NA 4
#9 Answer2 NA 19
#10 Answer2 NA 23
#11 Answer3 NA 3
#12 Answer3 NA 14
#13 Answer3 NA 22
最终清理。
rm(txt, answer, Data)