我有一个数据集,每个公司每年都有一些申请号。我想知道每年是否有增减申请号。
firm frimID Application year
A 123 a 2013
A 123 b 2013
A 123 b 2014
A 123 c 2014
A 123 c 2015
B 456 e 2013
B 456 f 2013
B 456 e 2014
B 456 g 2015
在这里,对于公司A,2014年将“ b”下降为“ a”,但添加“ c”,2015年将“ c”下降为“ a”和“ b”。
对于边框B,2014年保留“ e”,删除“ f”,2015年保留“ e”和“ f”,但添加“ g”。
我想知道每个公司每年的所有这些变化,计算数量,减少多少,增加多少。谢谢
答案 0 :(得分:1)
首先使用可用格式的数据
require(dplyr)
my_df <-
read.table(text =
'firms assignee_id Appl_No year
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20452 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21178 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21183 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21202 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21387 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21453 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21567 2003
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20452 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21178 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21183 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21202 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21387 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21453 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21567 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21678 2004
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2005
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2005
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2005
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2005
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2005',
header = TRUE)
然后按年份进行汇总,并使用滞后函数与上一年进行比较
my_df %>%
group_by(year) %>%
summarise(n_application = n()) %>%
arrange(year) %>%
mutate(previous_year_n_app = lag(n_application)) %>%
mutate(mor_than_last_year = n_application > previous_year_n_app)
# A tibble: 3 x 4
year n_application previous_year_n_app mor_than_last_year
<int> <int> <int> <lgl>
1 2003 12 NA NA
2 2004 13 12 TRUE
3 2005 5 13 FALSE