我在R中有一个非结构化的数据帧,即应该分组为四列的数据分布在数据帧中:
当我应用下面的代码时,我正在转置数据,然后使用子集将所有值组合在一起,然后再将其转置回来。结果如下:
但是,我确信通过循环机制可以更有效地实现此目的。
对于我可以采取的改进以下步骤的任何建议表示赞赏。理想情况下,我可以使用循环来整理将包含在特定数据框中的所有列。
tmydata=t(mydata)
df=data.frame(tmydata)
firstrow=subset(df, X1!="NA")
thefirstrow=firstrow[1]
secondrow=subset(df, X2!="NA")
thesecondrow=secondrow[2]
thirdrow=subset(df, X3!="NA")
thethirdrow=thirdrow[3]
fourthrow=subset(df, X4!="NA")
thefourthrow=fourthrow[4]
df2=data.frame(thefirstrow,thesecondrow,thethirdrow,thefourthrow)
finaloutput=t(df2)
finaldata=data.frame(finaloutput)
finaldata
col_headings <- c("A","B","C","D")
finaldata
names(finaldata) <- col_headings
答案 0 :(得分:0)
我认为以下是您想要的:
一些示例数据:
set.seed(1234)
df = matrix(runif(32),4,8)
colnames(df) = LETTERS[1:8]
df[df<0.2]=NA
代码:
library(plyr)
df = rbind.fill(lapply(1:nrow(df), function(x) {as.data.frame(t(df[x,][!is.na(df[x,])])) }))
colnames(df) = LETTERS[1:ncol(df)]
输入:
A B C D E F G H
[1,] NA 0.8609154 0.6660838 0.2827336 0.2862233 0.3166125 0.2187995 0.8313450
[2,] 0.6222994 0.6403106 0.5142511 0.9234335 0.2668208 0.3026934 0.8105986 NA
[3,] 0.6092747 NA 0.6935913 0.2923158 NA NA 0.5256975 0.4560915
[4,] 0.6233794 0.2325505 0.5449748 0.8372956 0.2322259 NA 0.9146582 0.2651867
输出:
A B C D E F G
1 0.8609154 0.6660838 0.2827336 0.2862233 0.3166125 0.2187995 0.8313450
2 0.6222994 0.6403106 0.5142511 0.9234335 0.2668208 0.3026934 0.8105986
3 0.6092747 0.6935913 0.2923158 0.5256975 0.4560915 NA NA
4 0.6233794 0.2325505 0.5449748 0.8372956 0.2322259 0.9146582 0.2651867
答案 1 :(得分:0)
# create a function which subsets x by removing NAs
naFilter = function(x) {
return(subset(x, !is.na(x)));
}
tidydata = as.data.frame( # convert the object into a data.frame
t( # transpose the apply output (see ?apply)
# apply the filter function per-row
apply(yourdata, 1, naFilter)
)
);
# rename data.frame columns
colnames(tidydata) = c("A", "B", "C", "D")