我有以下提示:
my_tbl <- tribble(
~year, ~event_id, ~winner_id,
2011, "A", 4322,
2011, "A", 9604,
2011, "A", 1180,
2013, "A", 4322,
2013, "A", 9604,
2013, "A", 1663,
2014, "A", 4322,
2016, "A", 5478,
2017, "A", 4322,
2017, "A", 1663,
2011, "B", 4322,
2013, "B", 7893,
2013, "B", 1188,
2014, "B", 7893,
2016, "B", 2365,
2017, "B", 3407,
2011, "C", 5556,
2013, "C", 5556,
2014, "C", 1238,
2016, "C", 2391,
2017, "C", 2391,
2011, "D", 4219,
2013, "D", 7623,
2014, "D", 8003,
2016, "D", 2851,
2017, "D", 0418
)
我想按事件ID找出连续最多的胜利。我要寻找的结果看起来像这样:
results_summary_tbl <- tribble(
~event_id, ~most_wins_in_a_row, ~number_of_winners, ~winners, ~years,
"A", 3, 1, "4322", "4322 = (2011, 2013, 2014)",
"C", 2, 2, "5556 , 2391", "5556 = (2011, 2013), 2391 = (2015, 2016)",
"B", 2, 1, "7893", "7893 = (2013, 2014)",
"D", 1, 5, "4219 , 7623 , 8003 , 2851 , 0418", "4219 = (2011), 7623 = (2013), 8003 = (2014), 2851 = (2016), 0418 = (2017)"
)
请注意,由于这些年没有发生任何事件,因此缺少年份。
以下代码段已提供给我,但由于缺少年份而无法使用:
my_tbl %>% arrange(event_id, winner_id, year) %>%
group_by(event_id, winner_id) %>%
mutate(run = cumsum(year - lag(year, default = first(year)) > 1)) %>%
count(event_id, winner_id, run) %>%
group_by(event_id) %>%
summarise(most_wins_in_a_row = max(n),
number_of_winners = sum(n == most_wins_in_a_row),
winners = paste0(winner_id[n == most_wins_in_a_row], collapse = ","))
答案 0 :(得分:1)
我在缺少的年份中遇到了问题,因此我不得不对rle使用base R方法来解决它。例如,我们取一个子集,其中event_id ==“ A”:
#include <iostream>
#include <string>
using namespace std;
int main() {
string s;
s.resize(1); // <-- add this!
s[0]='A';
cout << "s is: " << s << endl
<< "s[0] is: " << s[0] << endl;
}
要知道此活动举办的所有年份以及谁获胜,我要做的事情:
z = my_tbl[my_tbl$event_id =="A",]
这简化了查找每个列的最大连续1的问题。为此,我使用rle,
table(z$year,z$winner_id)
1180 1663 4322 5478 9604
2011 1 0 1 0 1
2013 0 1 1 0 1
2014 0 0 1 0 0
2016 0 0 0 1 0
2017 0 1 1 0 0
对于event_id ==一个子集,显示4322的条纹最长。这样,很容易将所需的输出写入data.frame。剩下的就是将此功能应用于所有数据子集:
apply(table(z$year,z$winner_id),2,function(i){
k=rle(i)
max(k$lengths[k$values == 1])
})
1180 1663 4322 5478 9604
1 1 3 1 2