数据处理如何知道每年的变化,观测名称是否每年增加或减少

时间:2019-12-27 14:46:23

标签: r data-processing

我有一个数据集,每个公司每年都有一些申请号。我想知道每年是否有增减申请号。

   firm frimID  Application  year
    A     123         a      2013
    A     123         b      2013      
    A     123         b      2014
    A     123         c      2014
    A     123         c      2015
    B     456         e      2013
    B     456         f      2013
    B     456         e      2014
    B     456         g      2015

在这里,对于公司A,2014年将“ b”下降为“ a”,但添加“ c”,2015年将“ c”下降为“ a”和“ b”。

对于边框B,2014年保留“ e”,删除“ f”,2015年保留“ e”和“ f”,但添加“ g”。

我想知道每个公司每年的所有这些变化,计算数量,减少多少,增加多少。谢谢

1 个答案:

答案 0 :(得分:1)

首先使用可用格式的数据

require(dplyr)

my_df <- 
  read.table(text = 
'firms assignee_id Appl_No year 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20452 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21178 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21183 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21202 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21387 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21453 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21567 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20452 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21178 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21183 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21202 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21387 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21453 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21567 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21678 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2005 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2005 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2005 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2005 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2005', 
header = TRUE)    

然后按年份进行汇总,并使用滞后函数与上一年进行比较

my_df %>%  
  group_by(year) %>%  
  summarise(n_application = n()) %>%  
  arrange(year) %>%  
  mutate(previous_year_n_app = lag(n_application)) %>%  
  mutate(mor_than_last_year = n_application > previous_year_n_app)

# A tibble: 3 x 4
   year n_application previous_year_n_app mor_than_last_year
  <int>         <int>               <int> <lgl>             
1  2003            12                  NA NA                
2  2004            13                  12 TRUE              
3  2005             5                  13 FALSE