每行计算无NA值

时间:2017-07-26 08:45:10

标签: r data.table

family_id<-c(1,2,3)
age_mother<-c(30,27,29)
dob_child1<-c("1998-11-12","1999-12-12","1996-04-12")##child one birth day
dob_child2<-c(NA,"1997-09-09",NA)##if no child,NA
dob_child3<-c(NA,"1999-09-01","1996-09-09")
DT<-data.table(family_id,age_mother,dob_child1,dob_child2,dob_child3)

现在我有了DT,我怎么能用这个表知道每个家庭有多少孩子使用这样的语法:

DT[,apply..,keyby=family_id]##this code is wrong

2 个答案:

答案 0 :(得分:0)

您可以使用sqldf包在SQL中使用R查询。

我复制了你的DT。

family_id<-c(1,2,3)
age_mother<-c(30,27,29)
dob_child1<-c("1998-11-12","1999-12-12","1996-04-12")##child one birth day
dob_child2<-c(NA,"1997-09-09",NA)##if no child,NA
dob_child3<-c(NA,"1999-09-01","1996-09-09")
DT<-data.table(family_id,age_mother,dob_child1,dob_child2,dob_child3)

library(sqldf)

sqldf('select distinct (count(dob_child3)+count(dob_child2)+count(dob_child1)) as total_child,
       family_id from DT group by family_id')

结果如下:

  total_child family_id
1           1         1
2           3         2
3           2         3

对你来说是对的吗?

答案 1 :(得分:0)

这也可能有效:

> DT$total_child <- as.vector(rowSums(!is.na(DT[, c("dob_child1", 
"dob_child2", "dob_child3")])))
> DT
  family_id age_mother dob_child1 dob_child2 dob_child3 total_child
1         1         30 1998-11-12       <NA>       <NA>           1
2         2         27 1999-12-12 1997-09-09 1999-09-01           3
3         3         29 1996-04-12       <NA> 1996-09-09           2