我有这种形式的数据框
familyid memberid year contract months
1 1 2000 1 12
1 1 2001 1 12
1 1 2002 1 12
1 1 2003 1 12
2 3 2000 2 12
2 3 2001 2 12
2 3 2002 2 12
2 3 2003 2 12
3 2 2000 1 5
3 2 2000 2 5
3 2 2001 1 12
3 2 2002 1 12
3 2 2003 1 12
4 1 2000 2 12
4 1 2001 2 12
4 1 2002 2 12
4 1 2003 2 12
5 2 2000 1 8
5 2 2001 1 12
5 2 2002 1 12
5 2 2003 1 4
5 2 2003 1 6
我想要一个像这样的数据框
familyid memberid year contract months
1 1 2000 1 12
1 1 2001 1 12
1 1 2002 1 12
1 1 2003 1 12
2 3 2000 2 12
2 3 2001 2 12
2 3 2002 2 12
2 3 2003 2 12
4 1 2000 2 12
4 1 2001 2 12
4 1 2002 2 12
4 1 2003 2 12
5 2 2000 1 8
5 2 2001 1 12
5 2 2002 1 12
**5 2 2003 1 10**
基本上,我想对变量月进行求和,如果它们相同的家庭ID对变量“ contract”显示相同的值(在我的示例中,在year = 2003中,我对familyid = 5求和是6和4)。但是,我也想丢弃在同一年显示可变合同的两个不同值的家族(在我的情况下,我丢弃familyid = 3,因为它在year = 2000中显示contract = 1和contract = 2)。对于其他观察,我想保持不变。
有人知道该怎么做吗?
感谢任何帮助我的人。 马可
答案 0 :(得分:1)
您提到要在一年内获得一个家庭的单份合同的总月数,而且还希望在一年内完全删除一份以上合同的家庭。这是一种方法:
library(dplyr)
df2 <- df %>%
group_by(familyid, memberid, year, contract) %>%
summarize(months = sum(months, na.rm = T)) %>%
# We need this to answer the second part. How many contracts did this family have this year?
mutate(contracts_this_yr = n()) %>%
ungroup() %>%
# Only include the families with no years of multiple contracts
group_by(familyid, memberid) %>%
filter(max(contracts_this_yr) < 2) %>%
ungroup()
输出
df2
# A tibble: 16 x 5
familyid memberid year contract months
<int> <int> <int> <int> <int>
1 1 1 2000 1 12
2 1 1 2001 1 12
3 1 1 2002 1 12
4 1 1 2003 1 12
5 2 3 2000 2 12
6 2 3 2001 2 12
7 2 3 2002 2 12
8 2 3 2003 2 12
9 4 1 2000 2 12
10 4 1 2001 2 12
11 4 1 2002 2 12
12 4 1 2003 2 12
13 5 2 2000 1 8
14 5 2 2001 1 12
15 5 2 2002 1 12
16 5 2 2003 1 10