计算不平衡集中的差异价格

时间:2019-04-15 19:14:26

标签: r dplyr

我有一个不平衡的数据框,其中包含日期,地区和价格。我想按日期计算不同地区之间的差异价格。我的数据不平衡,要想得到所有差异价格,我想创建数据(地点)来平衡数据。

我的数据如下:

library(dplyr)
set.seed(123)
df= data.frame(date=(1:3),
    locality= rbinom(21,3, 0.2),
    price=rnorm(21, 50, 20)) 

df %>%
arrange(date, locality)

>    date locality     price
1     1        0  60.07625
2     1        0  35.32994
3     1        0  63.69872
4     1        1  54.76426
5     1        1  66.51080
6     1        1  28.28602
7     1        2  47.09213
8     2        0  26.68910
9     2        1 100.56673
10    2        1  48.88628
11    2        1  48.29153
12    2        2  29.02214
13    2        2  45.68269
14    2        2  43.59887
15    3        0  60.98193
16    3        0  75.89527
17    3        0  43.30174
18    3        0  71.41221
19    3        0  33.62969
20    3        1  34.31236
21    3        1  23.76955

要获得平衡的数据,我想:

>    date locality     price
1     1        0  60.07625
2     1        0  35.32994
3     1        0  63.69872
4     1        1  54.76426
5     1        1  66.51080
6     1        1  28.28602
7     1        2  47.09213
8     1        2  NA
9     1        2  NA
10    2        0  26.68910
10    2        0  NA
10    2        0  NA
11    2        1 100.56673
12    2        1  48.88628
13    2        1  48.29153
14    2        2  29.02214
15    2        2  45.68269
16    2        2  43.59887
etc...

最后,我想获得差异价格beetwen对地区:

>   date      diff(price, 0-1)     diff(price, 0-2)    diff(price, 1-2)
1     1      60.07625-54.76426    60.07625-47.09213    etc...
2     1      35.32994-66.51080    35.32994-NA
3     1      63.69872-28.28602    63.69872-NA

1 个答案:

答案 0 :(得分:0)

您不需要平衡数据。如果您使用dcast,它将为您添加NA

首先转换数据以显示每个位置的单独列

library(data.table)
library(tidyverse)
setDT(df)

df[, rid := rowid(date, locality)]
df2 <- dcast(df, rid + date ~ locality, value.var = 'price')

#     rid date        0         1        2
#  1:   1    1 60.07625  54.76426 47.09213
#  2:   1    2 26.68910 100.56673 29.02214
#  3:   1    3 60.98193  34.31236       NA
#  4:   2    1 35.32994  66.51080       NA
#  5:   2    2       NA  48.88628 45.68269
#  6:   2    3 75.89527  23.76955       NA
#  7:   3    1 63.69872  28.28602       NA
#  8:   3    2       NA  48.29153 43.59887
#  9:   3    3 43.30174        NA       NA
# 10:   4    3 71.41221        NA       NA
# 11:   5    3 33.62969        NA       NA

然后创建差异的数据框to_diff进行计算,并在其上pmap进行差异计算。这里的c0_1与您在问题diff(price, 0-1)中所说的相对应。

to_diff <- CJ(0:2, 0:2)[V1 < V2]
pmap(to_diff, ~ df2[[as.character(.x)]] - df2[[as.character(.y)]]) %>% 
  setNames(paste0('c', to_diff[[1]], '_', to_diff[[2]])) %>% 
  bind_cols(df2[, 1:2])

# A tibble: 11 x 5
#      c0_1   c0_2  c1_2   rid  date
#     <dbl>  <dbl> <dbl> <int> <int>
#  1   5.31  13.0   7.67     1     1
#  2 -73.9   -2.33 71.5      1     2
#  3  26.7   NA    NA        1     3
#  4 -31.2   NA    NA        2     1
#  5  NA     NA     3.20     2     2
#  6  52.1   NA    NA        2     3
#  7  35.4   NA    NA        3     1
#  8  NA     NA     4.69     3     2
#  9  NA     NA    NA        3     3
# 10  NA     NA    NA        4     3
# 11  NA     NA    NA        5     3