如何通过`apply`和`lapply`操作保留colnames?

时间:2013-12-09 15:25:28

标签: r dataframe apply names lapply

我有一个名为RawHM的data.frame,并且希望每行评估列表AllList中的条目定义的列集,以查看是否有足够的非NA观察值(不小于2)保留该行的列条目集。如果不是,则列集条目应替换为NA。

AllList:

> dput(AllList)
structure(list(EGI = c("OO", "PP", "QQ"), Ref = c("RR", "SS", 
"TT")), .Names = c("EGI", "Ref"))

RawHM:

> dput(head(RawHM,10))
structure(list(OO = c(2.26128283268031, NA, NA, NA, 3.1189673217816, 
2.68131772865193, 1.50542478607416, NA, NA, NA), PP = c(NA, 2.86537733048028, 
2.02969026818987, NA, 2.54112005565494, 3.01623803266379, 1.73909499803785, 
2.49712237003491, NA, 1.67635525591635), QQ = c(NA, NA, 1.91968060122123, 
NA, NA, 2.63463138625395, NA, NA, NA, NA), RR = c(NA, NA, NA, 
NA, NA, 1.01488582084669, 1.01944283768403, NA, 1.06329113924051, 
NA), SS = c(0.950310559006211, 0.924124326404927, 1.07886334610473, 
0.951793999929161, 0.847931452310888, 0.879173290937997, 0.882126364182319, 
NA, NA, 0.713085668766746), TT = c(NA, NA, 1.09812749411644, 
NA, 0.9994646420402, 1.21090641120118, 1.25090285854196, NA, 
NA, NA)), .Names = c("OO", "PP", "QQ", "RR", "SS", "TT"), row.names = c(1L, 
2L, 15L, 16L, 23L, 24L, 25L, 30L, 36L, 40L), class = "data.frame")

我尝试过制作一个功能:

func<-function(x)unlist(lapply(AllList,function(y)if(length(na.omit(x[unlist(y)]))<2){rep(NA,length(unlist(y)))} else{x[unlist(y)]}))

然后:

output<-t(apply(RawHM,1,func))

哪个在原理中有效,但不保留colnames,我希望它与RawHM数据帧中的相同。我宁愿避免以后重命名列。

> dput(head(output,10))
structure(c(NA, NA, NA, NA, 3.1189673217816, 2.68131772865193, 
1.50542478607416, NA, NA, NA, NA, NA, 2.02969026818987, NA, 2.54112005565494, 
3.01623803266379, 1.73909499803785, NA, NA, NA, NA, NA, 1.91968060122123, 
NA, NA, 2.63463138625395, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
1.01488582084669, 1.01944283768403, NA, NA, NA, NA, NA, 1.07886334610473, 
NA, 0.847931452310888, 0.879173290937997, 0.882126364182319, 
NA, NA, NA, NA, NA, 1.09812749411644, NA, 0.9994646420402, 1.21090641120118, 
1.25090285854196, NA, NA, NA), .Dim = c(10L, 6L), .Dimnames = list(
    c("1", "2", "15", "16", "23", "24", "25", "30", "36", "40"
    ), NULL))

非常欢迎任何帮助:-) 问候 MADS

1 个答案:

答案 0 :(得分:0)

func是一个非常奇怪的功能......甚至是时髦的!

当您使用apply时,您的数据会从data.frame转换为矩阵。如果它是data.frame而不是矩阵,那么你的函数似乎运行方式不同:

func(RawHM[1,])
   EGI.OO    EGI.PP    EGI.QQ    Ref.RR    Ref.SS    Ref.TT 
2.2612828        NA        NA        NA 0.9503106        NA 
func(as.matrix(RawHM)[1,])
EGI1 EGI2 EGI3 Ref1 Ref2 Ref3 
  NA   NA   NA   NA   NA   NA 

请注意,您会得到不同的结果和不同的名称!

在任何情况下,名称问题都源于这样一个事实:当您生成NA时,没有名称,因此结果会为apply提供不一致的输出。要解决这个问题,这里有一个修改:

func2 <- function(x)unlist(lapply(AllList,function(y)if(length(na.omit(x[unlist(y)]))<2){sapply(y,function(z) NA)} else{x[unlist(y)]}))

t(apply(RawHM,1,func2))
     EGI.OO   EGI.PP   EGI.QQ   Ref.RR    Ref.SS    Ref.TT
1        NA       NA       NA       NA        NA        NA
2        NA       NA       NA       NA        NA        NA
15       NA 2.029690 1.919681       NA 1.0788633 1.0981275
16       NA       NA       NA       NA        NA        NA
23 3.118967 2.541120       NA       NA 0.8479315 0.9994646
24 2.681318 3.016238 2.634631 1.014886 0.8791733 1.2109064
25 1.505425 1.739095       NA 1.019443 0.8821264 1.2509029
30       NA       NA       NA       NA        NA        NA
36       NA       NA       NA       NA        NA        NA
40       NA       NA       NA       NA        NA        NA