当列名基于另一列的值时获取列的值

时间:2019-03-27 21:33:14

标签: r get data.table

我有一个包含一百万条记录的数据表,我尝试根据month.idx创建一个新列:

dt[, new_col := get(paset0("month_",month.idx)]

,它仅适用于第一行。

有人可以帮助我解决这个问题吗?谢谢!

Data
    id month_1 month_2 month_3 month_4 month_5 month.idx
1:  x1       1       1       1       0       1         3
2:  x2       0       0       0       1       0         4
3:  x3       1       0       0       0       0         1
4:  x4       0       0       0       0       0         5
5:  x5       1       1       0       0       1         2
6:  x6       0       1       0       1       1         3
7:  x7       0       0       1       1       1         4
8:  x8       0       0       0       0       0         1
9:  x9       0       0       0       0       1         5

results:
    id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1:  x1       1       1       1       0       1         3       1
2:  x2       0       0       0       1       0         4       0
3:  x3       1       0       0       0       0         1       0
4:  x4       0       0       0       0       0         5       0
5:  x5       1       1       0       0       1         2       0
6:  x6       0       1       0       1       1         3       0
7:  x7       0       0       1       1       1         4       1
8:  x8       0       0       0       0       0         1       0
9:  x9       0       0       0       0       1         5       0

expected:
    id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1:  x1       1       1       1       0       1         3       1
2:  x2       0       0       0       1       0         4       1
3:  x3       1       0       0       0       0         1       1
4:  x4       0       0       0       0       0         5       0
5:  x5       1       1       0       0       1         2       1
6:  x6       0       1       0       1       1         3       0
7:  x7       0       0       1       1       1         4       0
8:  x8       0       0       0       0       0         1       0
9:  x9       0       0       0       0       1         5       1

1 个答案:

答案 0 :(得分:2)

这里有2个选项:

1)使用get逐行输入Frank的评论:

DT[, new_col := get(paste0("month_", month.idx)), by= month.idx]

2)融化,然后加入进行查找

DT[, variable := paste0("month_", month.idx)]
DT[melt(DT, id.vars="id", measure.vars=patterns("^month_")), 
    on=.(id, variable), new_col := value]

速度取决于您拥有的行数和月列数。

数据:

DT <- fread("id month_1 month_2 month_3 month_4 month_5 month.idx
x1       1       1       1       0       1         3
x2       0       0       0       1       0         4
x3       1       0       0       0       0         1
x4       0       0       0       0       0         5
x5       1       1       0       0       1         2
x6       0       1       0       1       1         3
x7       0       0       1       1       1         4
x8       0       0       0       0       0         1")