获取成对差异

时间:2019-01-21 16:03:17

标签: r dplyr pairwise

我正在尝试获取地区之间价格的成对差异。我的数据如下:

table=NULL
table$id= 1:9
table$locality= c("A", "B", "C")
table$price= rnorm(9, 444, 322)
table$concat=paste(table$id, table$locality)
final=data.frame(table)
final

   id locality    price concat
   1        A 379.1501    1 A
   2        B 792.3608    2 B
   3        C 762.0627    3 C
   4        A 439.0378    4 A
   5        B 100.2860    5 B
   6        C 830.2024    6 C
   7        A 148.5925    7 A
   8        B 668.3819    8 B
   9        C 510.3919    9 C

我的目标是获取(如果可能的话)获取一个像该变量这样的表,在我的下表concat中是id:

  diff_A-B | diff_A-C | diff_B-C
    1A-2B  |  1A-3C   |  2B-3C
    1A-5B  |  1A-6C   |  2B-6C
    1A-8B  |  1A-9C   |  2B-9C
    4A-2B  |  4A-3C   |  5B-3C
    4A-5B  |  4A-6C   |  5B-6C
    4A-8B  |  4A-9C   |  5B-9C
    7A-2B  |  7A-3C   |  8B-3C
    7A-5B  |  7A-6C   |  8B-6C
    7A-8B  |  7A-9C   |  8B-9C

我尝试过:

  library(dplyr)
  table %>%
    arrange(id, locality) %>% 
    group_by(concat) %>%
    mutate(variables=outer(price,price, "-"))

但是输出不会显示我所需的结果。

请提供任何建议。

2 个答案:

答案 0 :(得分:3)

首先,我们将条目按3分组

final$group_number <- (final$id -1) %/%3 + 1

使用传播运算符,您可以将行转换为列,然后只需应用数学运算符即可找到差异

final %>% select(group_number, locality, price) %>% spread(locality, price) %>% mutate(diff_A_B = A - B,diff_B_C = B - C, diff_C_A = C - A )

希望这会有所帮助!

答案 1 :(得分:0)

最后,由于@Aakash Yadav的帮助,我可以得到想要的输出:

table=NULL
table$id= 1:9
table$locality= c("A", "B", "C")
table$price= rnorm(9, 444, 322)
table$concat=paste(table$id, table$locality)
final=data.frame(table)
final

    id locality     price concat
    1  1        A 740.98988    1 A
    2  2        B 714.53925    2 B
    3  3        C -80.83328    3 C
    4  4        A  20.11773    4 A
    5  5        B 423.78615    5 B
    6  6        C 155.25605    6 C
    7  7        A 806.36096    7 A
    8  8        B 182.55169    8 B
    9  9        C 682.04208    9 C


final$group_number <- (final$id -1) %/%3 + 1

library(dplyr)
library(tidyr)

w=final %>%
    select(group_number, locality, price) %>%
    spread(locality, price)   

  m1= abs(outer(w$A,w$B, "-"))
  m11=t(m1)
  A_B=as.vector(m11)

  m2=abs(outer(w$A,w$C, "-"))
  m22=t(m2)
  A_C=as.vector(m22)

  m3=abs(outer(w$B,w$C, "-"))
  m3=t(m3)
  B_C=as.vector(m3)
  df=data.frame(A_B, A_C, B_C)
  df

           A_B       A_C       B_C
    1  26.45063 821.82315 795.37252
    2 317.20373 585.73383 559.28320
    3 558.43819  58.94779  32.49717
    4 694.42152 100.95100 504.61942
    5 403.66842 135.13832 268.53010
    6 162.43396 661.92435 258.25594
    7  91.82171 887.19423 263.38496
    8 382.57481 651.10491  27.29564
    9 623.80927 124.31888 499.49039