连续减去data.table中的列

时间:2017-12-21 13:08:39

标签: r data.table

假设我有以下data.table:

 player_id prestige_score_0 prestige_score_1 prestige_score_2 prestige_score_3 prestige_score_4
   1:    100284     0.0001774623     2.519792e-03     5.870781e-03     7.430179e-03     7.937716e-03
   2:    103819     0.0001774623     1.426482e-03     3.904329e-03     5.526974e-03     6.373850e-03
   3:    100656     0.0001774623     2.142518e-03     4.221423e-03     5.822705e-03     6.533448e-03
   4:    104745     0.0001774623     1.084913e-03     3.061197e-03     4.383649e-03     5.091851e-03
   5:    104925     0.0001774623     1.488457e-03     2.926728e-03     4.360301e-03     5.068171e-03

我想找到从列prestige_score_0

开始的每列中值之间的差异

一步如下:df[,prestige_score_0] - df[,prestige_score_1]

如何在data.table中执行此操作(并将此差异另存为data.table并保留player_id)?

3 个答案:

答案 0 :(得分:2)

这是以整洁方式执行此操作的方法:

# make it tidy
df2 <- melt(df, 
            id = "player_id", 
            variable.name = "column_name", 
            value.name = "prestige_score")  
# extract numbers from column names
df2[, score_number := as.numeric(gsub("prestige_score_", "", column_name))]
# compute differences by player
df2[, diff := prestige_score - shift(prestige_score, n = 1L, type = "lead"),
    by = player_id]

# if necessary, reshape back to original format
dcast(df2, player_id ~ score_number, value.var = c("prestige_score", "diff"))

答案 1 :(得分:0)

你可以使用for循环 -

for(i in c(1:(ncol(df)-1)){
    df[, paste0("diff_", i-1, "_", i)] = df[, paste0("prestige_score_", i-1)] - 
                                              df[, paste0("prestige_score_", i)]
}

如果你有很多专栏,这可能不是最有效的。

答案 2 :(得分:0)

您可以减去自身偏移版本的整个dt

dt = data.table(id=c("A","B"),matrix(rexp(10, rate=.1), ncol=5))
dt_shift = data.table(id=dt[,id], dt[, 2:(ncol(dt)-1)] - dt[,3:ncol(dt)])