比较数据框中的xy坐标

时间:2018-08-02 21:35:29

标签: r dataframe dplyr

我目前有一个这样的数据框:

x        y       category
159.5    143.5   1
157.5    180.5   1
127.5    159.5   1
115.5    115.5   2
179.5    101.5   2
97.5     103.5   2
149.5    397.5   3
179.5    297.5   3

我想将每个点与数据框中的每个其他点进行比较。我想得到x和y的差(即点159.5、143.5到点157.5、180.5的绝对差是x的2和y的+37)

我已经尝试了多种方法来执行此操作,但是并没有接近,以及使用了太多的for循环,这太慢了。我很肯定有一种dplyr / function的方法可以做到这一点,这真的很有帮助。

这是我的示例输出目标(不是全部填写,而是总体思路):

x        y        abs x-diff   y-diff  category            
159.5    143.5    0            0       1         (from 159.5    143.5)    
159.5    143.5    2            37      1         (from 157.5    180.5)
159.5    143.5    32           16      1         (from 127.5    159.5)
157.5    180.5    0            0       1         (from 157.5    180.5)
157.5    180.5    2            -37     1         (from 159.5    143.5)
157.5    180.5                         1
127.5    159.5    0            0       1
127.5    159.5                         1
127.5    159.5                         1
115.5    115.5    0           0        2         (from 115.5    115.5)
115.5    115.5    64          -14      2         (from 179.5    101.5)
115.5    115.5    18          -12      2         (from 97.5     103.5)
179.5    101.5    0           0        2
179.5    101.5                         2
179.5    101.5                         2
97.5     103.5    0           0        2
97.5     103.5                         2
97.5     103.5                         2
149.5    397.5    0           0        3
149.5    397.5                         3         
179.5    297.5    0           0        3
179.5    297.5                         3

应该有3 * 3(对于类别1),3 * 3(对于类别2)和2 * 2(对于类别3),总共22行。

编辑:我添加了一个类别变量。我试图修改先前的答案以使其正常工作,但我想比较每个类别中的坐标。其他答案不适用于此添加的层,因为它会重复整个数据帧n次,但对于group_by而言则更为复杂。

4 个答案:

答案 0 :(得分:1)

顺便说一句,在示例输出中,您在期望正值或负值方面不一致。例如,159.5 from 157.5 = 2,而159.5 from 190.5157.5 from 159.5也是正数2,而对于y-diff,则既有负值也有正值。如果需要绝对值,请考虑在以下代码中将xdiffydiff包装到abs()中:

尽管如此,使用无循环的R底基,您仍然可以做到:

df <- read.table(text = "
x        y    
159.5    143.5
157.5    180.5
127.5    159.5
190.5    198.5
115.5    115.5
179.5    101.5
97.5     103.5
149.5    397.5", h = T)

df2 <- data.frame(x = rep(df$x, each = dim(df)[1]),
                  y = rep(df$y, each = dim(df)[1]),
                  xdiff = c(sapply(df$x, function(i) i - df$x)),
                  ydiff = c(sapply(df$y, function(j) j - df$y)),
                  fromx = rep(df$x, dim(df)[1]),
                  fromy = rep(df$y, dim(df)[1]))

       x     y xdiff ydiff fromx fromy
1  159.5 143.5     0     0 159.5 143.5
2  159.5 143.5     2   -37 157.5 180.5
3  159.5 143.5    32   -16 127.5 159.5
4  159.5 143.5   -31   -55 190.5 198.5
5  159.5 143.5    44    28 115.5 115.5
6  159.5 143.5   -20    42 179.5 101.5
7  159.5 143.5    62    40  97.5 103.5
8  159.5 143.5    10  -254 149.5 397.5
9  157.5 180.5    -2    37 159.5 143.5
10 157.5 180.5     0     0 157.5 180.5
11 157.5 180.5    30    21 127.5 159.5
12 157.5 180.5   -33   -18 190.5 198.5
13 157.5 180.5    42    65 115.5 115.5
14 157.5 180.5   -22    79 179.5 101.5
15 157.5 180.5    60    77  97.5 103.5
16 157.5 180.5     8  -217 149.5 397.5
17 127.5 159.5   -32    16 159.5 143.5
18 127.5 159.5   -30   -21 157.5 180.5
19 127.5 159.5     0     0 127.5 159.5
20 127.5 159.5   -63   -39 190.5 198.5
21 127.5 159.5    12    44 115.5 115.5
22 127.5 159.5   -52    58 179.5 101.5
23 127.5 159.5    30    56  97.5 103.5
24 127.5 159.5   -22  -238 149.5 397.5
25 190.5 198.5    31    55 159.5 143.5
26 190.5 198.5    33    18 157.5 180.5
27 190.5 198.5    63    39 127.5 159.5
28 190.5 198.5     0     0 190.5 198.5
29 190.5 198.5    75    83 115.5 115.5
30 190.5 198.5    11    97 179.5 101.5
31 190.5 198.5    93    95  97.5 103.5
32 190.5 198.5    41  -199 149.5 397.5
33 115.5 115.5   -44   -28 159.5 143.5
34 115.5 115.5   -42   -65 157.5 180.5
35 115.5 115.5   -12   -44 127.5 159.5
36 115.5 115.5   -75   -83 190.5 198.5
37 115.5 115.5     0     0 115.5 115.5
38 115.5 115.5   -64    14 179.5 101.5
39 115.5 115.5    18    12  97.5 103.5
40 115.5 115.5   -34  -282 149.5 397.5
41 179.5 101.5    20   -42 159.5 143.5
42 179.5 101.5    22   -79 157.5 180.5
43 179.5 101.5    52   -58 127.5 159.5
44 179.5 101.5   -11   -97 190.5 198.5
45 179.5 101.5    64   -14 115.5 115.5
46 179.5 101.5     0     0 179.5 101.5
47 179.5 101.5    82    -2  97.5 103.5
48 179.5 101.5    30  -296 149.5 397.5
49  97.5 103.5   -62   -40 159.5 143.5
50  97.5 103.5   -60   -77 157.5 180.5
51  97.5 103.5   -30   -56 127.5 159.5
52  97.5 103.5   -93   -95 190.5 198.5
53  97.5 103.5   -18   -12 115.5 115.5
54  97.5 103.5   -82     2 179.5 101.5
55  97.5 103.5     0     0  97.5 103.5
56  97.5 103.5   -52  -294 149.5 397.5
57 149.5 397.5   -10   254 159.5 143.5
58 149.5 397.5    -8   217 157.5 180.5
59 149.5 397.5    22   238 127.5 159.5
60 149.5 397.5   -41   199 190.5 198.5
61 149.5 397.5    34   282 115.5 115.5
62 149.5 397.5   -30   296 179.5 101.5
63 149.5 397.5    52   294  97.5 103.5
64 149.5 397.5     0     0 149.5 397.5

如果愿意,可以考虑通过执行x == fromx

删除y == fromydf2[!c(df2$x == df2$fromx & df2$y == df2$fromy),]的行。

答案 1 :(得分:1)

以下是outerexpand.grid构建的所有差异的集合:

cbind(cbind(with(dat, expand.grid(x=x,x=x)), xdiff=-c( with(dat, outer(x,x,"-") ))),
    cbind( with(dat, expand.grid(y=y,y=y)), ydiff=-c( with(dat, outer(y,y,"-") ))))

#-----------
       x     x xdiff     y     y ydiff
1  159.5 159.5     0 143.5 143.5     0
2  157.5 159.5     2 180.5 143.5   -37
3  127.5 159.5    32 159.5 143.5   -16
4  190.5 159.5   -31 198.5 143.5   -55
5  115.5 159.5    44 115.5 143.5    28
6  179.5 159.5   -20 101.5 143.5    42
7   97.5 159.5    62 103.5 143.5    40
8  149.5 159.5    10 397.5 143.5  -254
9  159.5 157.5    -2 143.5 180.5    37
10 157.5 157.5     0 180.5 180.5     0
11 127.5 157.5    30 159.5 180.5    21
12 190.5 157.5   -33 198.5 180.5   -18
13 115.5 157.5    42 115.5 180.5    65
14 179.5 157.5   -22 101.5 180.5    79
  #----snipped rest of 68 rows

答案 2 :(得分:0)

在其他地方找到了类似的解决方案。

"

答案 3 :(得分:-1)

这是不使用循环的一种可能的解决方案:

df <- data.frame(x = c(159.5, 157.5, 127.5, 190.5, 115.5, 179.5, 97.5, 149.5), y = c(143.5, 180.5, 159.5, 198.5, 115.5, 101.5, 103.5, 397.5) )
dx <- df$x[1:7] - df$x[2:8]
dy <- df$y[1:7] - df$y[2:8]

产生所需的差异:

> dx
[1]   2  30 -63  75 -64  82 -52
> dy
[1]  -37   21  -39   83   14   -2 -294