Question

我有以下数据库，我想压缩它而忽略NA并按列x分组并且每行有一条记录。

d <- data.frame(x=c("efg", "hij", "abc", "abc"), y=c("P","K",NA,"R"), z=c("J",NA,"L",NA))

我使用以下内容，似乎效果不佳

library(plyr)
d_2 = ddply(d,.(x,na.omit(y),na,omit(z)),frequency)

有人可以帮忙吗？

Answer 1

这很可能不是最优雅的代码，但它确实适用于您上面的有限测试用例。

#Define the dataframe with strings and not factors
d <- data.frame(x=c("efg", "hij", "abc", "abc"), 
                y=c("P","K",NA,"R"), 
                z=c("J",NA,"L",NA), stringsAsFactors = FALSE)
d[is.na(d)]<-"" #removes the NAs
#aggregate the rows
out<-aggregate(d[2:3], by=list(d$x), FUN=toString)
names(out)[1]<-"x"  #Renames the first column
#Removes any commas added from the aggregated/toString command 
#and set the results back to a data frame
d<-data.frame(apply(out, MARGIN=c(1,2), FUN=gsub, pattern=", ", replacement=""))

可以调整聚合线以适应多个列。我本来想使用dplyr库的groupby和汇总函数，但我无法跨多个列应用该操作。

压缩数据并忽略空单元格

1 个答案: