我有一个如下所示的数据框:
weekyear Location_Id priceA priceB
1 20101 6367 0.8712934 8
2 20101 6380 0.1712934 8
3 20102 6367 0.8712934 4
4 20102 6380 0.4712934 4
5 20103 6367 0.8712934 1
6 20103 6380 0.8712934 9
我想贬低priceA和priceB。每个都按位置和时间索引。
我想要priceAnew = priceA_{location,time} - mean(over time)(priceA_{location}) - mean(over location)(priceA_{time})
这里的符号更清晰: https://stats.stackexchange.com/questions/126549/do-people-used-fixed-effects-in-lasso
这样做是否有非痛苦的方式?
答案 0 :(得分:5)
我猜你正在寻找像
这样的东西transform(dd,
newA = priceA-ave(priceA, weekyear)-ave(priceA, Location_Id),
newB = priceB-ave(priceB, weekyear)-ave(priceB, Location_Id)
)
(其中dd
是您的data.frame的名称)。返回
weekyear Location_Id priceA priceB newA newB
1 20101 6367 0.8712934 8 -0.5212934 -4.333333
2 20101 6380 0.1712934 8 -0.8546267 -7.000000
3 20102 6367 0.8712934 4 -0.6712934 -4.333333
4 20102 6380 0.4712934 4 -0.7046267 -7.000000
5 20103 6367 0.8712934 1 -0.8712934 -8.333333
6 20103 6380 0.8712934 9 -0.5046267 -3.000000
您的样本输入。如果您必须在许多列上执行此操作,我可能更喜欢循环。
cols <- paste0("price", LETTERS[1:2])
for(col in cols) {
dd[[paste0("new", col)]] <- dd[[col]] -
ave(dd[[col]], dd$weekyear)-
ave(dd[[col]], dd$Location_Id),
}