在小标题栏中找到最长的连续条纹

时间:2019-11-20 16:27:15

标签: r tibble

我有以下提示:

my_tbl <- tribble(
  ~year, ~event_id, ~winner_id,
  2011,      "A",     4322,
  2011,      "A",     9604,
  2011,      "A",     1180,
  2013,      "A",     4322,
  2013,      "A",     9604,
  2013,      "A",     1663,
  2014,      "A",     4322,
  2016,      "A",     5478,
  2017,      "A",     4322,
  2017,      "A",     1663,
  2011,      "B",     4322,
  2013,      "B",     7893,
  2013,      "B",     1188,
  2014,      "B",     7893,
  2016,      "B",     2365,
  2017,      "B",     3407,
  2011,      "C",     5556,
  2013,      "C",     5556,
  2014,      "C",     1238,
  2016,      "C",     2391,
  2017,      "C",     2391,
  2011,      "D",     4219,
  2013,      "D",     7623,
  2014,      "D",     8003,
  2016,      "D",     2851,
  2017,      "D",     0418
)

我想按事件ID找出连续最多的胜利。我要寻找的结果看起来像这样:

results_summary_tbl <- tribble(
  ~event_id, ~most_wins_in_a_row, ~number_of_winners, ~winners,                              ~years,
   "A",       3,                  1,                   "4322",                               "4322 = (2011, 2013, 2014)",
   "C",       2,                  2,                   "5556 , 2391",                        "5556 = (2011, 2013), 2391 = (2015, 2016)",
   "B",       2,                  1,                   "7893",                               "7893 = (2013, 2014)",
   "D",       1,                  5,                   "4219 , 7623 , 8003 , 2851 , 0418",   "4219 = (2011), 7623 = (2013), 8003 = (2014), 2851 = (2016), 0418 = (2017)"
)

请注意,由于这些年没有发生任何事件,因此缺少年份。

以下代码段已提供给我,但由于缺少年份而无法使用:

my_tbl %>% arrange(event_id, winner_id, year) %>%
  group_by(event_id, winner_id) %>%
  mutate(run = cumsum(year - lag(year, default = first(year)) > 1)) %>%
  count(event_id, winner_id, run) %>%
  group_by(event_id) %>%
  summarise(most_wins_in_a_row = max(n),
            number_of_winners = sum(n == most_wins_in_a_row),
            winners = paste0(winner_id[n == most_wins_in_a_row], collapse = ","))

1 个答案:

答案 0 :(得分:1)

我在缺少的年份中遇到了问题,因此我不得不对rle使用base R方法来解决它。例如,我们取一个子集,其中event_id ==“ A”:

#include <iostream>
#include <string>

using namespace std;

int main() {
    string s;

    s.resize(1); // <-- add this!

    s[0]='A';
    cout << "s is: " << s << endl
         << "s[0] is: " << s[0] << endl;
}

要知道此活动举办的所有年份以及谁获胜,我要做的事情:

z = my_tbl[my_tbl$event_id =="A",]

这简化了查找每个列的最大连续1的问题。为此,我使用rle,

table(z$year,z$winner_id)

       1180 1663 4322 5478 9604
  2011    1    0    1    0    1
  2013    0    1    1    0    1
  2014    0    0    1    0    0
  2016    0    0    0    1    0
  2017    0    1    1    0    0

对于event_id ==一个子集,显示4322的条纹最长。这样,很容易将所需的输出写入data.frame。剩下的就是将此功能应用于所有数据子集:

apply(table(z$year,z$winner_id),2,function(i){
           k=rle(i)
           max(k$lengths[k$values == 1])
         })
1180 1663 4322 5478 9604 
   1    1    3    1    2