我有以下数据框:
>dados
COUNTRY Year CO2 emissions Pop. Growth(%)
Argentina 1994 1.23 0.3
Argentina 1995 1.26 0.2
Argentina 1996 1.28 0.4
Argentina 1997 1.24 0.2
Brazil 1994 1.54 0.7
Brazil 1995 1.59 0.6
Brazil 1996 1.60 0.9
Brazil 1997 1.58 1.3
我想首先区分每个国家/地区的变量CO2 emissions
和Pop. Growth(%)
。我已经尝试了函数dados[,2:4] <- diff(dados[,2:4])
,但它返回了错误:
“r [i1]中的错误 - r [-length(r): - (length(r) - lag + 1L)]:非数字 二元运算符的参数“
答案 0 :(得分:1)
以下是dplyr
:
library(dplyr)
df %>%
group_by(COUNTRY) %>%
mutate_at(vars(CO2_emissions:Pop_Growth), funs(.-lag(.)))
<强>结果:强>
# A tibble: 8 x 4
# Groups: COUNTRY [2]
COUNTRY Year CO2_emissions Pop_Growth
<fctr> <int> <dbl> <dbl>
1 Argentina 1994 NA NA
2 Argentina 1995 0.03 -0.1
3 Argentina 1996 0.02 0.2
4 Argentina 1997 -0.04 -0.2
5 Brazil 1994 NA NA
6 Brazil 1995 0.05 -0.1
7 Brazil 1996 0.01 0.3
8 Brazil 1997 -0.02 0.4
数据:强>
df = structure(list(COUNTRY = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L), .Label = c("Argentina", "Brazil"), class = "factor"),
Year = c(1994L, 1995L, 1996L, 1997L, 1994L, 1995L, 1996L,
1997L), CO2_emissions = c(1.23, 1.26, 1.28, 1.24, 1.54, 1.59,
1.6, 1.58), Pop_Growth = c(0.3, 0.2, 0.4, 0.2, 0.7, 0.6,
0.9, 1.3)), .Names = c("COUNTRY", "Year", "CO2_emissions",
"Pop_Growth"), class = "data.frame", row.names = c(NA, -8L))