类似于我的recent question,我试图通过在其中代表多少个独特的个体以及分别代表多少次来总结一个巨大的数据框。我可以通过完全匹配来做到这一点,但在附近匹配时需要帮助。
例如我拥有的数据
df1 <- data.frame(name= c("Ann", "Betsy", "Charlie", "Dave", "Betsy", "Ann"),
surname = c("Smith", "Jones", "Parker", "Rees", "Jones", "Smith"),
encounter = c(as.Date("2000-01-01", "%Y-%m-%d"),
as.Date("2001-01-01", "%Y-%m-%d"),
as.Date("2002-01-01", "%Y-%m-%d"),
as.Date("2003-01-01", "%Y-%m-%d"),
as.Date("2001-01-01", "%Y-%m-%d"),
as.Date("2000-01-10", "%Y-%m-%d")), stringsAsFactors=FALSE)
我想要的类似于
df1 %>%
group_by(name, surname, encounter) %>%
summarise(n = n())
哪个给
# A tibble: 5 x 4
# Groups: name, surname [4]
name surname encounter n
<chr> <chr> <date> <int>
1 Ann Smith 2000-01-01 1
2 Ann Smith 2000-01-10 1
3 Betsy Jones 2001-01-01 2
4 Charlie Parker 2002-01-01 1
5 Dave Rees 2003-01-01 1
但是,如果他们的相遇日期相隔+/- 30天而不是完全相同,我该如何对其进行分组。产生类似于
的东西# A tibble: 4 x 5
# Groups: name, surname [4]
name surname encounter n encounter2
<chr> <chr> <date> <dbl> <date>
1 Ann Smith 2000-01-01 2 2000-01-10
2 Betsy Jones 2001-01-01 2 NA
3 Charlie Parker 2002-01-01 1 NA
4 Dave Rees 2003-01-01 1 NA
但是遇到2专栏将是一个不错的奖励。