Question

我有一个包含一百万条记录的数据表，我尝试根据month.idx创建一个新列：

dt[, new_col := get(paset0("month_",month.idx)]

，它仅适用于第一行。

有人可以帮助我解决这个问题吗？谢谢！

Data
    id month_1 month_2 month_3 month_4 month_5 month.idx
1:  x1       1       1       1       0       1         3
2:  x2       0       0       0       1       0         4
3:  x3       1       0       0       0       0         1
4:  x4       0       0       0       0       0         5
5:  x5       1       1       0       0       1         2
6:  x6       0       1       0       1       1         3
7:  x7       0       0       1       1       1         4
8:  x8       0       0       0       0       0         1
9:  x9       0       0       0       0       1         5

results:
    id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1:  x1       1       1       1       0       1         3       1
2:  x2       0       0       0       1       0         4       0
3:  x3       1       0       0       0       0         1       0
4:  x4       0       0       0       0       0         5       0
5:  x5       1       1       0       0       1         2       0
6:  x6       0       1       0       1       1         3       0
7:  x7       0       0       1       1       1         4       1
8:  x8       0       0       0       0       0         1       0
9:  x9       0       0       0       0       1         5       0

expected:
    id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1:  x1       1       1       1       0       1         3       1
2:  x2       0       0       0       1       0         4       1
3:  x3       1       0       0       0       0         1       1
4:  x4       0       0       0       0       0         5       0
5:  x5       1       1       0       0       1         2       1
6:  x6       0       1       0       1       1         3       0
7:  x7       0       0       1       1       1         4       0
8:  x8       0       0       0       0       0         1       0
9:  x9       0       0       0       0       1         5       1

Answer 1

这里有2个选项：

1）使用get逐行输入Frank的评论：

DT[, new_col := get(paste0("month_", month.idx)), by= month.idx]

2）融化，然后加入进行查找

DT[, variable := paste0("month_", month.idx)]
DT[melt(DT, id.vars="id", measure.vars=patterns("^month_")), 
    on=.(id, variable), new_col := value]

速度取决于您拥有的行数和月列数。

数据：

DT <- fread("id month_1 month_2 month_3 month_4 month_5 month.idx
x1       1       1       1       0       1         3
x2       0       0       0       1       0         4
x3       1       0       0       0       0         1
x4       0       0       0       0       0         5
x5       1       1       0       0       1         2
x6       0       1       0       1       1         3
x7       0       0       1       1       1         4
x8       0       0       0       0       0         1")

当列名基于另一列的值时获取列的值

1 个答案: