基于另一列中的值的算术运算

时间:2019-05-02 00:41:58

标签: r dplyr data.table tidyr

我有一个带有多年数值栏的数据框。这些年可能不遵循顺序,并且可能缺少第5年。这是一个示例数据框

df = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT", "AUT", "AUT", "AUT", "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ARM"),
            PPT = c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563, 56, 54, 645, 6, 4,53, 656, 65, 5563, 646, 6, 66, 54), 
            Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001, 2002, 2014, 2004, 2005, 2006, 2007, 1960, 2009, NA, 2011, 2012, 2013, 2014))

我想添加一个额外的列,该列将基于该年份与year + 5的值之间的差异。例如如果year列中的第一年是1960,但是1965年没有PPT数据,那么new_col中的值为NA。同样,1990年的new_col值为119(123-4),2000年的NA(2005年没有可用的PPT数据),1991年的19和1992年的-2等。

我在excel中有一个非常复杂的方法,但是,我正在寻找R中更简单的解决方案

3 个答案:

答案 0 :(得分:2)

我们可以arrange到'Year',然后取'PPT'与{PPT'的lead的差,其中'n'被指定为5

library(dplyr)
df %>%
    arrange(Year) %>% 
    mutate(newcol = PPT - lead(PPT, n = 5, default = 0))
#    code  PPT Year newcol
#1   AFG  123 1990    119
#2   AGO   42 1991     19
#3   ALB   23 1992     -2
#4   AND    5 1993     -1
#5   ARB   23 1994   -611
#6   ARE    4 1995     -1
#7   ARG   23 1996  -5540
#8   ARM   25 1997    -31
#9   ASM    6 1998    -50
#10  ATG  634 1999    -11
#...

如果缺少某些“年份”,我们可以使用complete扩展数据,然后执行mutate

library(tidyr)
df %>% 
    arrange(Year) %>% 
    complete(Year = min(Year):max(Year)) %>%
    mutate(newcol = PPT - lead(PPT, n = 5, default = 0)) %>%
    filter(!is.na(PPT))

或使用base R

df$newcol <- with(df, c(head(PPT, -5) - tail(PPT, -5), tail(PPT, 5)))

数据

df <- structure(list(code = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 13L, 13L, 13L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 9L), .Label = c("ABW", "AFG", "AGO", "ALB", "AND", 
"ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT"), class = "factor"), 
    PPT = c(123, 42, 23, 5, 23, 4, 23, 25, 6, 634, 5, 5563, 56, 
    56, 645, 6, 4, 656, 645, 65, 5563, 646, 6, 66, 54),
    Year = 1990:2014), class = "data.frame", row.names = c(NA, 
-25L))

答案 1 :(得分:2)

一个data.table解决方案,可在缺少/有间隔的年份使用...

样本数据

df = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT", "AUT", "AUT", "AUT", "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ARM"),
                PPT = c(123, 42, 23, 5, 23, 4, 23, 25, 6, 634, 5, 5563, 56, 56, 645, 6, 4, 656, 645, 65, 5563, 646, 6, 66, 54), 
                Year = c(1990:2014))

代码

library(data.table)
#create a data.table with all years from minimum untill maximum + 5
#so missing years will get a NA!
#perform a by-reference join on these years, by Year
result <- data.table( Year = min(df$Year):(max(df$Year) + 5) )[setDT(df), `:=`(code = i.code, PPT = i.PPT), on = .(Year)]
#calculate the desired column, delete unwanted rows
result[, newcol := PPT - shift(PPT, 5, type = "lead" )][!is.na(code),][]

输出

#     Year code  PPT newcol
#  1: 1990  AFG  123    119
#  2: 1991  AGO   42     19
#  3: 1992  ALB   23     -2
#  4: 1993  AND    5     -1
#  5: 1994  ARB   23   -611
#  6: 1995  ARE    4     -1
#  7: 1996  ARG   23  -5540
#  8: 1997  ARM   25    -31
#  9: 1998  ASM    6    -50
# 10: 1999  ATG  634    -11
# 11: 2000  AUS    5     -1
# 12: 2001  AUT 5563   5559
# 13: 2002  AUT   56   -600
# 14: 2003  AUT   56   -589
# 15: 2004  AUT  645    580
# 16: 2005  ABW    6  -5557
# 17: 2006  AFG    4   -642
# 18: 2007  AGO  656    650
# 19: 2008  ALB  645    579
# 20: 2009  AND   65     11
# 21: 2010  ARB 5563     NA
# 22: 2011  ARE  646     NA
# 23: 2012  ARG    6     NA
# 24: 2013  ARM   66     NA
# 25: 2014  ARM   54     NA
#     Year code  PPT newcol

答案 2 :(得分:1)

我们也可以使用/opt/bitnami/apps/jasperserver/buildomatic/conf_source/db/postgresql/jdbc/postgresql-9.4-1210.jdbc41.jar /opt/bitnami/apps/jasperserver/buildomatic/conf_source/db/postgresql/jdbc/postgresql-9.4-1210.jdbc42.jar /opt/bitnami/apps/jasperserver/buildomatic/conf_source/db/app-srv-jdbc-drivers/postgresql-9.4-1210.jdbc41.jar /opt/bitnami/apps/jasperserver/buildomatic/conf_source/ieCe/lib/postgresql-9.4-1210.jdbc41.jar

mapply