我有一个包含一百万条记录的数据表,我尝试根据month.idx
创建一个新列:
dt[, new_col := get(paset0("month_",month.idx)]
,它仅适用于第一行。
有人可以帮助我解决这个问题吗?谢谢!
Data
id month_1 month_2 month_3 month_4 month_5 month.idx
1: x1 1 1 1 0 1 3
2: x2 0 0 0 1 0 4
3: x3 1 0 0 0 0 1
4: x4 0 0 0 0 0 5
5: x5 1 1 0 0 1 2
6: x6 0 1 0 1 1 3
7: x7 0 0 1 1 1 4
8: x8 0 0 0 0 0 1
9: x9 0 0 0 0 1 5
results:
id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1: x1 1 1 1 0 1 3 1
2: x2 0 0 0 1 0 4 0
3: x3 1 0 0 0 0 1 0
4: x4 0 0 0 0 0 5 0
5: x5 1 1 0 0 1 2 0
6: x6 0 1 0 1 1 3 0
7: x7 0 0 1 1 1 4 1
8: x8 0 0 0 0 0 1 0
9: x9 0 0 0 0 1 5 0
expected:
id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1: x1 1 1 1 0 1 3 1
2: x2 0 0 0 1 0 4 1
3: x3 1 0 0 0 0 1 1
4: x4 0 0 0 0 0 5 0
5: x5 1 1 0 0 1 2 1
6: x6 0 1 0 1 1 3 0
7: x7 0 0 1 1 1 4 0
8: x8 0 0 0 0 0 1 0
9: x9 0 0 0 0 1 5 1
答案 0 :(得分:2)
这里有2个选项:
1)使用get
逐行输入Frank的评论:
DT[, new_col := get(paste0("month_", month.idx)), by= month.idx]
2)融化,然后加入进行查找
DT[, variable := paste0("month_", month.idx)]
DT[melt(DT, id.vars="id", measure.vars=patterns("^month_")),
on=.(id, variable), new_col := value]
速度取决于您拥有的行数和月列数。
数据:
DT <- fread("id month_1 month_2 month_3 month_4 month_5 month.idx
x1 1 1 1 0 1 3
x2 0 0 0 1 0 4
x3 1 0 0 0 0 1
x4 0 0 0 0 0 5
x5 1 1 0 0 1 2
x6 0 1 0 1 1 3
x7 0 0 1 1 1 4
x8 0 0 0 0 0 1")