区分数据框列的每个子集

时间:2020-02-22 00:00:55

标签: r subset diff

我有一个ID,年和月为单位的数据框。我需要按年份和月份分组,并从该分组中获取唯一的ID。我想将唯一ID与上一年,月份组,添加的ID和减去的ID进行比较。

那种在黑暗中射击的方法,但我尝试了以下方法,但不起作用:

connections <- df %>%
  group_by(year, month) %>%
  arrange(year, month) %>%
  diff_data(unique(as.vector(~ID)), lag(unique(as.vector(~ID))))

样本数据

df <- data.frame(ID=c("A1", "A2", "A3", "A1", "A2","A4", "A1", "A4", "A5"),
year= c(2010, 2010, 2010, 2011, 2011, 2011, 2012, 2012, 2012), 
month= c(1, 2, 3, 1, 2, 3, 1, 2, 3))

Desired Output

1 个答案:

答案 0 :(得分:0)

First会在月份和年份都进行class stack1 { private: int num1[SIZE/2]; int top1; public: void push1(int data) { if (is_full1()); else { num1[top1] = data; top1++; } } int pop1(void) { if(is_empty1()); else { top1--; return num1[top1]; } } int is_empty1(void) { if(top1 == 0) { return 1; }else { return 0; } } int is_full1(void) { if(top1 == SIZE) { return 1; }else { return 0; } } stack1() { top1 = 0; num1[SIZE/2] = {0}; } }; class stack2 { private: int num2[SIZE/2]; int top2; public: void push2(int data) { if (is_full2()); else { num2[top2] = data; top2++; } } int pop2(void) { if(is_empty2()); else { top2--; return num2[top2]; } } int is_empty2(void) { if(top2 == 0) { return 1; }else { return 0; } } int is_full2(void) { if(top2 == SIZE) { return 1; }else { return 0; } } stack2() { top2 = 0; num2[SIZE/2] = {0}; } }; class operation: public stack2, public stack1 { private: int answer; int a; int b; int num_cnt; int ans; int from_st1; int from_st2; public: int c; int oper(void) { answer = 0; a = 0; b = 0; num_cnt = 0; ans = 0; c = 0; stack1 st1; stack2 st2; while(!st1.is_empty1()) { from_st1 = st1.pop1(); if(from_st1 == plus) { st2.push2(from_st1); }else if(from_st1 == minus) { st2.push2(from_st1); }else if(from_st1 == mult) { st2.push2(from_st1); }else if (from_st1 == divide) { st2.push2(from_st1); }else if(num_cnt == 1) { num_cnt = 0; if(ans == 0) { answer = b; ans++; } a = from_st1; from_st2 = st2.pop2(); if(from_st2 == plus) { c = a+answer; }else if(from_st2 == minus) { c = a-answer; }else if(from_st2 == mult) { c = a*answer; }else if(from_st2 == divide) { c = a/answer; } }else { b = from_st1; } num_cnt++; } return c; } operation() { answer = 0; a = 0; b = 0; num_cnt = 0; ans = 0; from_st1 = 0; from_st2 = 0; } }; 。在这种方法中,将列出每月添加和删除的所有ID,并获得aggregate来计算每月添加和删除的数量。

length

输出

library(tidyverse)

df %>%
  aggregate(ID ~ year + month, ., unique, drop = FALSE) %>%
  group_by(month) %>%
  arrange(year) %>%
  mutate(addedID = mapply(setdiff, ID, lag(ID), SIMPLIFY = FALSE),
         num_addedID = lapply(addedID, length),
         deletedID = mapply(setdiff, lag(ID), ID, SIMPLIFY = FALSE),
         num_deletedID = lapply(deletedID, function(x) length(na.omit(x)))) %>%
  ungroup() %>%
  arrange(month, year) %>%
  as.data.frame()

数据

  year month ID addedID num_addedID deletedID num_deletedID
1 2010     1 A1      A1           1        NA             0
2 2011     1 A1                   0                       0
3 2012     1 A1                   0                       0
4 2010     2 A3      A3           1        NA             0
5 2011     2 A2      A2           1        A3             1
6 2012     2 A4      A4           1        A2             1
7 2010     3 A3      A3           1        NA             0
8 2011     3 A4      A4           1        A3             1
9 2012     3 A5      A5           1        A4             1