我有一个数据框,其中有日期数据和累计计数。
我正在尝试tidyr
。
这是代码:
df <- data.frame(cum_count = c(5, 14, 50, 5, 14, 50),
state = c("Alabama", "Alabama", "Alabama", "NY", "NY", "NY"),
Year = c(2012:2014, 2012:2014))
Dataframe A
cum_count state Year
1 5 Alabama 2012
2 14 Alabama 2013
3 50 Alabama 2014
4 5 NY 2012
5 14 NY 2013
6 50 NY 2014
Dataframe B
cum_count state Year
1 5 Alabama 2012
2 9 Alabama 2013
3 36 Alabama 2014
4 5 NY 2012
5 9 NY 2013
6 36 NY 2014
我尝试使用diff函数:
df <- df %>%group_by(state)%>%
mutate(daily_count = diff(cum_count))
但是我明白了
错误:列
daily_count
的长度必须为3(行数)或1,而不是2
让我知道您的想法。
谢谢!
答案 0 :(得分:1)
diff
返回的length
比原始长度小1,并且mutate
要求输出列具有与原始列相同的length
(或长度1,可以回收) )。我们可以附加一个值NA
或'{_1}}值'cum_count'
first
为此,请使用library(dplyr)
df %>%
group_by(state)%>%
mutate(daily_count = c(first(cum_count), diff(cum_count)))
# A tibble: 6 x 4
# Groups: state [2]
# cum_count state Year daily_count
# <dbl> <fct> <int> <dbl>
#1 5 Alabama 2012 5
#2 14 Alabama 2013 9
#3 50 Alabama 2014 36
#4 5 NY 2012 5
#5 14 NY 2013 9
#6 50 NY 2014 36
并从列本身中减去
lag