用R中不相等的列重组凌乱的数据

时间:2018-11-04 21:51:16

标签: r dataframe data-cleaning

我有一个非常混乱的数据集,看起来像这样:

Person1
Answer1  10  3
Answer2  4   12  5
Person2
Answer2  12  3   4   19  23
Answer3  3   14  22  

我想把它变成这样:

              Person1  Person2
Answer1       10
Answer2       4
Answer2       12
Answer2       5
Answer2                  12
Answer2                  3
Answer2                  4
Answer2                  19
Answer2                  23
Answer3                  3
Answer3                  14
Answer3                  22

我对此完全迷失了。我尝试了以下for循环,以尝试将原始列中的数据提取到干净的数据集中:

  for(i in 1:nrow(dat)){
   for(j in 2:ncol(dat)){
    if(!is.na(dat[i,j])){
      dat.clean[i+1,2]<-dat[i,j]
      dat.clean[i,1]<-dat[i,1]
    }else{}
  }
}

但是我要彻底清除垃圾了。任何帮助将不胜感激!

输出:

  X1 X2
1    5 NA
2    2  5
3    3  3
4    1  5
5    3  6
6    4 23
7   NA 22

1 个答案:

答案 0 :(得分:1)

这有点复杂,但是使用示例数据集(另存为CSV文件即可)。

txt <- readLines("messydata.csv")
txt <- txt[sapply(txt, nchar) != 0]

answer <- NULL
Data <- list()

for(x in txt){
  value <- NULL
  if(grepl("person", x, ignore.case = TRUE)) {
    curr <- unlist(strsplit(x, ","))
    curr <- curr[sapply(curr, nchar) != 0]
  }
  if(grepl("answer", x, ignore.case = TRUE)){
    y <- unlist(strsplit(x, ","))
    y <- y[sapply(y, nchar) != 0]
    answer <- c(answer, rep(y[1], length(y) - 1))
    value <- scan(text = y[-1])
    Data[[curr]] <- c(Data[[curr]], value)
  }
}

n <- length(answer)
s <- 0L
for(i in seq_along(Data)){
  d <- length(Data[[i]])
  Data[[i]] <- c(rep(NA, s), Data[[i]], rep(NA, n - s - d))
  s <- s + d
}

result <- data.frame(Answer = answer, do.call(cbind, Data))
result
#    Answer Person1 Person2
#1  Answer1      10      NA
#2  Answer1       3      NA
#3  Answer2       4      NA
#4  Answer2      12      NA
#5  Answer2       5      NA
#6  Answer2      NA      12
#7  Answer2      NA       3
#8  Answer2      NA       4
#9  Answer2      NA      19
#10 Answer2      NA      23
#11 Answer3      NA       3
#12 Answer3      NA      14
#13 Answer3      NA      22

最终清理。

rm(txt, answer, Data)