我遇到了一个令人讨厌的命令,我想要对摘要进行分析,我正在使用dplyr
包。用一些示例数据解释是最容易的:
structure(list(Date = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L),
Name = structure(c(3L, 3L, 4L, 3L, 2L, 3L, 2L, 4L, 1L), .Label = c("George",
"Jack", "John", "Mary"), class = "factor"), Birth.Year = c(1995L,
1995L, 1997L, 1995L, 1999L, 1995L, 1999L, 1997L, 1997L),
Special_Balance = c(10L, 40L, 30L, 5L, 10L, 15L, 2L, 1L,
100L), Total_Balance = c(100L, 100L, 50L, 200L, 20L, 200L,
20L, 100L, 1600L)), .Names = c("Date", "Name", "Birth.Year",
"Special_Balance", "Total_Balance"), class = "data.frame", row.names = c(NA,
-9L))
我的目标是两个简单的摘要:首先,我想通过Date
总结一下,代码如下所示。错误的部分是total_balance_sum
计算,其中我想要计算每个人的平衡,但每个人只有一次。例如,我对Date=1
的命令的结果是total_balance_sum=100
,但它应该是150(将杰克的total_balance
添加到玛丽的total_balance
一次50次)。这个错误的计算显然会弄乱最终的pct
计算。
example_data %>%
group_by(Date) %>%
summarise(
total_people=n_distinct(Name),
total_loan_exposures=n(),
special_sum=sum(Special_Balance,na.rm=TRUE),
total_balance_sum=sum(Total_Balance[n_distinct(Name)]),
total_pct=special_sum/total_balance_sum
) -> example_summary
在第二个摘要(下方)中,我按日期和出生年份进行分组,并再次错误地计算total_balance_sum
。
example_data %>%
group_by(Date,Birth.Year) %>%
summarise(
total_people=n_distinct(Name),
total_loan_exposures=n(),
special_sum=sum(Special_Balance,na.rm=TRUE),
total_balance_sum=sum(Total_Balance[n_distinct(Name)]),
total_pct=special_sum/total_balance_sum
) -> example_summary_birthyear
实现目标的正确方法是什么?很明显,我使用的n_distinct
只是采用其中一个值,而不是在名称之间正确地对其进行求和。
感谢您的帮助。
答案 0 :(得分:2)
我对你可能要求的内容有点不清楚,但是这样做你想做什么?:(仅针对第一个例子)
example_data %>%
group_by(Date, Name) %>%
summarise(
total_loan_exposures=n(),
total_SpecialPerson=sum(Special_Balance,na.rm=TRUE),
total_balance_sumPerson=Total_Balance[1])%>%
ungroup() %>%
group_by(Date) %>%
summarise(
total_people=n(),
total_loan_exposures=sum(total_loan_exposures),
special_sum=sum(total_SpecialPerson,na.rm=TRUE),
total_balance_sum=sum(total_balance_sumPerson)) %>%
mutate(total_pct=(special_sum/total_balance_sum))-> example_summary
> example_summary
Source: local data frame [3 x 6]
Date total_people total_loan_exposures special_sum total_balance_sum total_pct
1 1 2 3 80 150 0.53333333
2 2 2 4 32 220 0.14545455
3 3 2 2 101 1700 0.05941176
答案 1 :(得分:1)
对于第二个例子(对于第一个例子,只需删除Birth.Year):
library(dplyr)
example_data %>% group_by(Date, Birth.Year) %>%
mutate(special_sum = sum(Special_Balance),
total_loan_exposure = n( )) %>%
distinct(Name, Total_Balance) %>%
summarise(Total_balance_sum = sum(Total_Balance),
special_sum = special_sum[1],
total_people = n(),
total_loan_exposure = total_loan_exposure[1],
special_sum/Total_balance_sum)