我有一个data.table
,看起来像这样:
df
seller_id buyer_id hs10 year mean_ln_c_hit
1: 1 5 1200000000 5 4.407456
2: 1 24 1500000000 2 4.173422
3: 1 8 1500000000 3 4.532695
4: 1 8 1500000000 5 4.830106
5: 1 16 1500000000 5 4.830106
6: 1 17 1800000000 3 3.920435
7: 1 20 1800000000 5 4.357196
8: 1 27 1800000000 5 4.357196
9: 1 1 1900000000 1 4.835762
10: 1 2 1900000000 1 4.835762
11: 1 3 1900000000 1 4.835762
12: 1 4 1900000000 1 4.835762
13: 1 5 1900000000 1 4.835762
14: 1 6 1900000000 1 4.835762
15: 1 7 1900000000 1 4.835762
16: 1 8 1900000000 1 4.835762
17: 1 9 1900000000 1 4.835762
18: 1 11 1900000000 1 4.835762
19: 1 12 1900000000 1 4.835762
20: 1 13 1900000000 1 4.835762
21: 1 14 1900000000 1 4.835762
22: 1 15 1900000000 1 4.835762
23: 1 16 1900000000 1 4.835762
24: 1 17 1900000000 1 4.835762
25: 1 18 1900000000 1 4.835762
26: 1 19 1900000000 1 4.835762
27: 1 20 1900000000 1 4.835762
28: 1 21 1900000000 1 4.835762
29: 1 22 1900000000 1 4.835762
30: 1 23 1900000000 1 4.835762
31: 1 24 1900000000 1 4.835762
32: 1 25 1900000000 1 4.835762
33: 1 26 1900000000 1 4.835762
34: 1 27 1900000000 1 4.835762
35: 1 28 1900000000 1 4.835762
36: 1 29 1900000000 1 4.835762
37: 1 30 1900000000 1 4.835762
38: 1 1 1900000000 2 4.409253
39: 1 2 1900000000 2 4.409253
40: 1 3 1900000000 2 4.409253
41: 1 4 1900000000 2 4.409253
42: 1 5 1900000000 2 4.409253
43: 1 6 1900000000 2 4.409253
44: 1 7 1900000000 2 4.409253
45: 1 8 1900000000 2 4.409253
46: 1 9 1900000000 2 4.409253
47: 1 10 1900000000 2 4.409253
48: 1 11 1900000000 2 4.409253
49: 1 12 1900000000 2 4.409253
50: 1 13 1900000000 2 4.409253
51: 1 14 1900000000 2 4.409253
52: 1 15 1900000000 2 4.409253
53: 1 16 1900000000 2 4.409253
54: 1 17 1900000000 2 4.409253
55: 1 18 1900000000 2 4.409253
56: 1 19 1900000000 2 4.409253
57: 1 20 1900000000 2 4.409253
58: 1 21 1900000000 2 4.409253
59: 1 22 1900000000 2 4.409253
60: 1 23 1900000000 2 4.409253
61: 1 25 1900000000 2 4.409253
62: 1 26 1900000000 2 4.409253
63: 1 27 1900000000 2 4.409253
64: 1 28 1900000000 2 4.409253
65: 1 29 1900000000 2 4.409253
66: 1 30 1900000000 2 4.409253
67: 1 1 1900000000 3 4.514642
68: 1 3 1900000000 3 4.514642
69: 1 4 1900000000 3 4.514642
70: 1 5 1900000000 3 4.514642
71: 1 6 1900000000 3 4.514642
72: 1 7 1900000000 3 4.514642
73: 1 9 1900000000 3 4.514642
74: 1 11 1900000000 3 4.514642
75: 1 12 1900000000 3 4.514642
76: 1 13 1900000000 3 4.514642
77: 1 14 1900000000 3 4.514642
78: 1 15 1900000000 3 4.514642
79: 1 16 1900000000 3 4.514642
80: 1 18 1900000000 3 4.514642
81: 1 19 1900000000 3 4.514642
82: 1 20 1900000000 3 4.514642
83: 1 21 1900000000 3 4.514642
84: 1 22 1900000000 3 4.514642
85: 1 23 1900000000 3 4.514642
86: 1 24 1900000000 3 4.514642
87: 1 25 1900000000 3 4.514642
88: 1 26 1900000000 3 4.514642
89: 1 27 1900000000 3 4.514642
90: 1 28 1900000000 3 4.514642
91: 1 29 1900000000 3 4.514642
92: 1 30 1900000000 3 4.514642
93: 1 2 1900000000 5 4.698335
94: 1 3 1900000000 5 4.698335
95: 1 4 1900000000 5 4.698335
96: 1 6 1900000000 5 4.698335
97: 1 7 1900000000 5 4.698335
98: 1 9 1900000000 5 4.698335
99: 1 11 1900000000 5 4.698335
100: 1 12 1900000000 5 4.698335
我想使用data.table
功能创建一个名为lag_mean_ln_c_hit
的新列,其中包含列lag
中值的一年滞后mean_ln_c_hit
,条件是seller_id
和hs10
。我最好的尝试是:
df[!is.na(year), lag_mean_ln_c_hit:=
(.SD[.(seller_id = seller_id, hs10 = hs10, year = year - 1), mean_ln_c_hit, on = c("seller_id", "hs10", "year"), allow.cartesian = TRUE ]) ]
哪种类型的图片可以正确显示NAs
(即year = 1
是第一年,因此第一year
的所有观察值的滞后时间应为NA),但失败捕捉正确的时滞。对于除4.835762
以外的任何其他组合,它始终报告NA
(尽管一旦报告4.173422
,则为该组合)。
每个seller_id
具有多个buyer_id
值的mean_ln_c_hit
条目。因此,我有重复。运行代码,我得到以下Warning as well
Warning message:
In `[.data.table`(df, !is.na(year), `:=`(lag_mean_ln_c_hit, (.SD[.(seller_id = seller_id, :
Supplied 1640 items to be assigned to 100 items of column 'lag_mean_ln_c_hit' (1540 unused)
没有allow.cartesian = TRUE
,代码将无法运行。
关于如何解决此问题的任何想法?它应该很简单,但是我无法理解。
答案 0 :(得分:1)
不清楚所需的输出是什么。如果必须在一年前,可以选择以下方法:
seller <- unique(df, by=c("seller_id", "hs10", "year"))
df[, oneyearago := year - 1L]
df[, lag_mean_ln_c_hit := seller[.SD, on=.(seller_id, hs10, year=oneyearago), x.mean_ln_c_hit]]
输出:
seller_id buyer_id hs10 year mean_ln_c_hit oneyearago lag_mean_ln_c_hit
1: 1 5 1200000000 5 4.407456 4 NA
2: 1 24 1500000000 2 4.173422 1 NA
3: 1 8 1500000000 3 4.532695 2 4.173422
4: 1 8 1500000000 5 4.830106 4 NA
5: 1 16 1500000000 5 4.830106 4 NA
6: 1 17 1800000000 3 3.920435 2 NA
7: 1 20 1800000000 5 4.357196 4 NA
8: 1 27 1800000000 5 4.357196 4 NA
9: 1 1 1900000000 1 4.835762 0 NA
10: 1 2 1900000000 1 4.835762 0 NA
11: 1 3 1900000000 1 4.835762 0 NA
12: 1 4 1900000000 1 4.835762 0 NA
13: 1 5 1900000000 1 4.835762 0 NA
14: 1 6 1900000000 1 4.835762 0 NA
15: 1 7 1900000000 1 4.835762 0 NA
16: 1 8 1900000000 1 4.835762 0 NA
17: 1 9 1900000000 1 4.835762 0 NA
18: 1 11 1900000000 1 4.835762 0 NA
19: 1 12 1900000000 1 4.835762 0 NA
20: 1 13 1900000000 1 4.835762 0 NA
21: 1 14 1900000000 1 4.835762 0 NA
22: 1 15 1900000000 1 4.835762 0 NA
23: 1 16 1900000000 1 4.835762 0 NA
24: 1 17 1900000000 1 4.835762 0 NA
25: 1 18 1900000000 1 4.835762 0 NA
26: 1 19 1900000000 1 4.835762 0 NA
27: 1 20 1900000000 1 4.835762 0 NA
28: 1 21 1900000000 1 4.835762 0 NA
29: 1 22 1900000000 1 4.835762 0 NA
30: 1 23 1900000000 1 4.835762 0 NA
31: 1 24 1900000000 1 4.835762 0 NA
32: 1 25 1900000000 1 4.835762 0 NA
33: 1 26 1900000000 1 4.835762 0 NA
34: 1 27 1900000000 1 4.835762 0 NA
35: 1 28 1900000000 1 4.835762 0 NA
36: 1 29 1900000000 1 4.835762 0 NA
37: 1 30 1900000000 1 4.835762 0 NA
38: 1 1 1900000000 2 4.409253 1 4.835762
39: 1 2 1900000000 2 4.409253 1 4.835762
40: 1 3 1900000000 2 4.409253 1 4.835762
41: 1 4 1900000000 2 4.409253 1 4.835762
42: 1 5 1900000000 2 4.409253 1 4.835762
43: 1 6 1900000000 2 4.409253 1 4.835762
44: 1 7 1900000000 2 4.409253 1 4.835762
45: 1 8 1900000000 2 4.409253 1 4.835762
46: 1 9 1900000000 2 4.409253 1 4.835762
47: 1 10 1900000000 2 4.409253 1 4.835762
48: 1 11 1900000000 2 4.409253 1 4.835762
49: 1 12 1900000000 2 4.409253 1 4.835762
50: 1 13 1900000000 2 4.409253 1 4.835762
51: 1 14 1900000000 2 4.409253 1 4.835762
52: 1 15 1900000000 2 4.409253 1 4.835762
53: 1 16 1900000000 2 4.409253 1 4.835762
54: 1 17 1900000000 2 4.409253 1 4.835762
55: 1 18 1900000000 2 4.409253 1 4.835762
56: 1 19 1900000000 2 4.409253 1 4.835762
57: 1 20 1900000000 2 4.409253 1 4.835762
58: 1 21 1900000000 2 4.409253 1 4.835762
59: 1 22 1900000000 2 4.409253 1 4.835762
60: 1 23 1900000000 2 4.409253 1 4.835762
61: 1 25 1900000000 2 4.409253 1 4.835762
62: 1 26 1900000000 2 4.409253 1 4.835762
63: 1 27 1900000000 2 4.409253 1 4.835762
64: 1 28 1900000000 2 4.409253 1 4.835762
65: 1 29 1900000000 2 4.409253 1 4.835762
66: 1 30 1900000000 2 4.409253 1 4.835762
67: 1 1 1900000000 3 4.514642 2 4.409253
68: 1 3 1900000000 3 4.514642 2 4.409253
69: 1 4 1900000000 3 4.514642 2 4.409253
70: 1 5 1900000000 3 4.514642 2 4.409253
71: 1 6 1900000000 3 4.514642 2 4.409253
72: 1 7 1900000000 3 4.514642 2 4.409253
73: 1 9 1900000000 3 4.514642 2 4.409253
74: 1 11 1900000000 3 4.514642 2 4.409253
75: 1 12 1900000000 3 4.514642 2 4.409253
76: 1 13 1900000000 3 4.514642 2 4.409253
77: 1 14 1900000000 3 4.514642 2 4.409253
78: 1 15 1900000000 3 4.514642 2 4.409253
79: 1 16 1900000000 3 4.514642 2 4.409253
80: 1 18 1900000000 3 4.514642 2 4.409253
81: 1 19 1900000000 3 4.514642 2 4.409253
82: 1 20 1900000000 3 4.514642 2 4.409253
83: 1 21 1900000000 3 4.514642 2 4.409253
84: 1 22 1900000000 3 4.514642 2 4.409253
85: 1 23 1900000000 3 4.514642 2 4.409253
86: 1 24 1900000000 3 4.514642 2 4.409253
87: 1 25 1900000000 3 4.514642 2 4.409253
88: 1 26 1900000000 3 4.514642 2 4.409253
89: 1 27 1900000000 3 4.514642 2 4.409253
90: 1 28 1900000000 3 4.514642 2 4.409253
91: 1 29 1900000000 3 4.514642 2 4.409253
92: 1 30 1900000000 3 4.514642 2 4.409253
93: 1 2 1900000000 5 4.698335 4 NA
94: 1 3 1900000000 5 4.698335 4 NA
95: 1 4 1900000000 5 4.698335 4 NA
96: 1 6 1900000000 5 4.698335 4 NA
97: 1 7 1900000000 5 4.698335 4 NA
98: 1 9 1900000000 5 4.698335 4 NA
99: 1 11 1900000000 5 4.698335 4 NA
100: 1 12 1900000000 5 4.698335 4 NA
seller_id buyer_id hs10 year mean_ln_c_hit oneyearago lag_mean_ln_c_hit
或者至少在一年前,您可以使用非等额联接并选择最后一场比赛:
seller <- unique(df, by=c("seller_id", "hs10", "year"))
df[, lag_mean_ln_c_hit := seller[.SD, on=.(seller_id, hs10, year<year), x.mean_ln_c_hit, mult="last"]]
数据:
library(data.table)
df <- fread("seller_id buyer_id hs10 year mean_ln_c_hit
1 5 1200000000 5 4.407456
1 24 1500000000 2 4.173422
1 8 1500000000 3 4.532695
1 8 1500000000 5 4.830106
1 16 1500000000 5 4.830106
1 17 1800000000 3 3.920435
1 20 1800000000 5 4.357196
1 27 1800000000 5 4.357196
1 1 1900000000 1 4.835762
1 2 1900000000 1 4.835762
1 3 1900000000 1 4.835762
1 4 1900000000 1 4.835762
1 5 1900000000 1 4.835762
1 6 1900000000 1 4.835762
1 7 1900000000 1 4.835762
1 8 1900000000 1 4.835762
1 9 1900000000 1 4.835762
1 11 1900000000 1 4.835762
1 12 1900000000 1 4.835762
1 13 1900000000 1 4.835762
1 14 1900000000 1 4.835762
1 15 1900000000 1 4.835762
1 16 1900000000 1 4.835762
1 17 1900000000 1 4.835762
1 18 1900000000 1 4.835762
1 19 1900000000 1 4.835762
1 20 1900000000 1 4.835762
1 21 1900000000 1 4.835762
1 22 1900000000 1 4.835762
1 23 1900000000 1 4.835762
1 24 1900000000 1 4.835762
1 25 1900000000 1 4.835762
1 26 1900000000 1 4.835762
1 27 1900000000 1 4.835762
1 28 1900000000 1 4.835762
1 29 1900000000 1 4.835762
1 30 1900000000 1 4.835762
1 1 1900000000 2 4.409253
1 2 1900000000 2 4.409253
1 3 1900000000 2 4.409253
1 4 1900000000 2 4.409253
1 5 1900000000 2 4.409253
1 6 1900000000 2 4.409253
1 7 1900000000 2 4.409253
1 8 1900000000 2 4.409253
1 9 1900000000 2 4.409253
1 10 1900000000 2 4.409253
1 11 1900000000 2 4.409253
1 12 1900000000 2 4.409253
1 13 1900000000 2 4.409253
1 14 1900000000 2 4.409253
1 15 1900000000 2 4.409253
1 16 1900000000 2 4.409253
1 17 1900000000 2 4.409253
1 18 1900000000 2 4.409253
1 19 1900000000 2 4.409253
1 20 1900000000 2 4.409253
1 21 1900000000 2 4.409253
1 22 1900000000 2 4.409253
1 23 1900000000 2 4.409253
1 25 1900000000 2 4.409253
1 26 1900000000 2 4.409253
1 27 1900000000 2 4.409253
1 28 1900000000 2 4.409253
1 29 1900000000 2 4.409253
1 30 1900000000 2 4.409253
1 1 1900000000 3 4.514642
1 3 1900000000 3 4.514642
1 4 1900000000 3 4.514642
1 5 1900000000 3 4.514642
1 6 1900000000 3 4.514642
1 7 1900000000 3 4.514642
1 9 1900000000 3 4.514642
1 11 1900000000 3 4.514642
1 12 1900000000 3 4.514642
1 13 1900000000 3 4.514642
1 14 1900000000 3 4.514642
1 15 1900000000 3 4.514642
1 16 1900000000 3 4.514642
1 18 1900000000 3 4.514642
1 19 1900000000 3 4.514642
1 20 1900000000 3 4.514642
1 21 1900000000 3 4.514642
1 22 1900000000 3 4.514642
1 23 1900000000 3 4.514642
1 24 1900000000 3 4.514642
1 25 1900000000 3 4.514642
1 26 1900000000 3 4.514642
1 27 1900000000 3 4.514642
1 28 1900000000 3 4.514642
1 29 1900000000 3 4.514642
1 30 1900000000 3 4.514642
1 2 1900000000 5 4.698335
1 3 1900000000 5 4.698335
1 4 1900000000 5 4.698335
1 6 1900000000 5 4.698335
1 7 1900000000 5 4.698335
1 9 1900000000 5 4.698335
1 11 1900000000 5 4.698335
1 12 1900000000 5 4.698335")