我有一个包含4列的data.frame DT_new
:
样品:
Graduated Work Married Jumlah
2015-05-01 2015-05-02 2015-05-03 20
NA 2015-05-02 2015-05-03 20
NA NA 2015-05-03 20
NA 2015-05-02 NA 20
我需要在Jumlah
或Graduated
或Work
Married
Graduated
值不是NA
时,请使用Graduated
中的日期Graduated
值为NA
时,请使用Work
或其他值
Married
格式化我想要的是:
Dates Total
2015-05-01 10
2015-05-02 40
2015-05-03 30
我在R中尝试了aggregate
with group by,但是只按1列(分级)进行了分组,例如:
DT_Totals = DT_Total %>%
group_by(Graduated) %>%
summarise(Total= sum(Jumlah)) %>%
arrange(Graduated)
我该如何处理我的问题?
答案 0 :(得分:3)
您需要先创建新列,然后将它们分组。
我得到的函数首先返回定义为:
的向量中的NA元素first_not_na <- function(...) {
Reduce(list(...), f=function(x,y) {
x[is.na(x)] <- y[is.na(x)]
x
})
}
您可以按照以下方式使用
DT_new %>%
group_by(Date = first_not_na(Graduated, Work, Married)) %>%
summarise(Total = sum(Jumlah)) %>%
arrange(Date)
或分为两步:
DT_new %>%
mutate(Date = first_not_na(Graduated, Work, Married)) %>%
group_by(Date) %>%
summarise(Total = sum(Jumlah)) %>%
arrange(Date)
答案 1 :(得分:2)
只需使用ifelse
创建新的日期列:
DT_new %>%
mutate(Date1 = ifelse(!is.na(Graduated), Graduated, ifelse(!is.na(Work), Work, Married))) %>%
group_by(Date1) %>%
summarise(Total = sum(Jumlah)) %>%
arrange(Date1)
如果日期是数字(日期)类型:
DT_new %>%
mutate(Date1 = ifelse(!is.na(Graduated), Graduated, ifelse(!is.na(Work), Work, Married))) %>%
mutate(Date1 = as.Date(Date1, origin = "1970-01-01")) %>%
group_by(Date1) %>%
summarise(Total = sum(Jumlah)) %>%
arrange(Date1)