贬低R中多个级别的数据

时间:2014-12-11 20:23:42

标签: r dataframe transform

我有一个如下所示的数据框:

weekyear      Location_Id              priceA                   priceB
1    20101        6367                0.8712934                    8
2    20101        6380                0.1712934                    8
3    20102        6367                0.8712934                    4
4    20102        6380                0.4712934                    4
5    20103        6367                0.8712934                    1
6    20103        6380                0.8712934                    9

我想贬低priceA和priceB。每个都按位置和时间索引。

我想要
priceAnew = priceA_{location,time} - mean(over time)(priceA_{location}) - mean(over location)(priceA_{time})

这里的符号更清晰: https://stats.stackexchange.com/questions/126549/do-people-used-fixed-effects-in-lasso

这样做是否有非痛苦的方式?

1 个答案:

答案 0 :(得分:5)

我猜你正在寻找像

这样的东西
transform(dd, 
    newA = priceA-ave(priceA, weekyear)-ave(priceA, Location_Id),
    newB = priceB-ave(priceB, weekyear)-ave(priceB, Location_Id)
)

(其中dd是您的data.frame的名称)。返回

  weekyear Location_Id    priceA priceB       newA      newB
1    20101        6367 0.8712934      8 -0.5212934 -4.333333
2    20101        6380 0.1712934      8 -0.8546267 -7.000000
3    20102        6367 0.8712934      4 -0.6712934 -4.333333
4    20102        6380 0.4712934      4 -0.7046267 -7.000000
5    20103        6367 0.8712934      1 -0.8712934 -8.333333
6    20103        6380 0.8712934      9 -0.5046267 -3.000000

您的样本输入。如果您必须在许多列上执行此操作,我可能更喜欢循环。

cols <- paste0("price", LETTERS[1:2])
for(col in cols) {
    dd[[paste0("new", col)]] <- dd[[col]] - 
        ave(dd[[col]], dd$weekyear)-
        ave(dd[[col]], dd$Location_Id),
}