我有以下数据库,我想压缩它而忽略NA
并按列x
分组并且每行有一条记录。
d <- data.frame(x=c("efg", "hij", "abc", "abc"), y=c("P","K",NA,"R"), z=c("J",NA,"L",NA))
我使用以下内容,似乎效果不佳
library(plyr)
d_2 = ddply(d,.(x,na.omit(y),na,omit(z)),frequency)
有人可以帮忙吗?
答案 0 :(得分:0)
这很可能不是最优雅的代码,但它确实适用于您上面的有限测试用例。
#Define the dataframe with strings and not factors
d <- data.frame(x=c("efg", "hij", "abc", "abc"),
y=c("P","K",NA,"R"),
z=c("J",NA,"L",NA), stringsAsFactors = FALSE)
d[is.na(d)]<-"" #removes the NAs
#aggregate the rows
out<-aggregate(d[2:3], by=list(d$x), FUN=toString)
names(out)[1]<-"x" #Renames the first column
#Removes any commas added from the aggregated/toString command
#and set the results back to a data frame
d<-data.frame(apply(out, MARGIN=c(1,2), FUN=gsub, pattern=", ", replacement=""))
可以调整聚合线以适应多个列。 我本来想使用dplyr库的groupby和汇总函数,但我无法跨多个列应用该操作。