我目前有一个这样的数据框:
x y category
159.5 143.5 1
157.5 180.5 1
127.5 159.5 1
115.5 115.5 2
179.5 101.5 2
97.5 103.5 2
149.5 397.5 3
179.5 297.5 3
我想将每个点与数据框中的每个其他点进行比较。我想得到x和y的差(即点159.5、143.5到点157.5、180.5的绝对差是x的2和y的+37)
我已经尝试了多种方法来执行此操作,但是并没有接近,以及使用了太多的for循环,这太慢了。我很肯定有一种dplyr / function的方法可以做到这一点,这真的很有帮助。
这是我的示例输出目标(不是全部填写,而是总体思路):
x y abs x-diff y-diff category
159.5 143.5 0 0 1 (from 159.5 143.5)
159.5 143.5 2 37 1 (from 157.5 180.5)
159.5 143.5 32 16 1 (from 127.5 159.5)
157.5 180.5 0 0 1 (from 157.5 180.5)
157.5 180.5 2 -37 1 (from 159.5 143.5)
157.5 180.5 1
127.5 159.5 0 0 1
127.5 159.5 1
127.5 159.5 1
115.5 115.5 0 0 2 (from 115.5 115.5)
115.5 115.5 64 -14 2 (from 179.5 101.5)
115.5 115.5 18 -12 2 (from 97.5 103.5)
179.5 101.5 0 0 2
179.5 101.5 2
179.5 101.5 2
97.5 103.5 0 0 2
97.5 103.5 2
97.5 103.5 2
149.5 397.5 0 0 3
149.5 397.5 3
179.5 297.5 0 0 3
179.5 297.5 3
应该有3 * 3(对于类别1),3 * 3(对于类别2)和2 * 2(对于类别3),总共22行。
编辑:我添加了一个类别变量。我试图修改先前的答案以使其正常工作,但我想比较每个类别中的坐标。其他答案不适用于此添加的层,因为它会重复整个数据帧n次,但对于group_by而言则更为复杂。
答案 0 :(得分:1)
顺便说一句,在示例输出中,您在期望正值或负值方面不一致。例如,159.5 from 157.5 = 2
,而159.5 from 190.5
和157.5 from 159.5
也是正数2
,而对于y-diff
,则既有负值也有正值。如果需要绝对值,请考虑在以下代码中将xdiff
和ydiff
包装到abs()
中:
尽管如此,使用无循环的R底基,您仍然可以做到:
df <- read.table(text = "
x y
159.5 143.5
157.5 180.5
127.5 159.5
190.5 198.5
115.5 115.5
179.5 101.5
97.5 103.5
149.5 397.5", h = T)
df2 <- data.frame(x = rep(df$x, each = dim(df)[1]),
y = rep(df$y, each = dim(df)[1]),
xdiff = c(sapply(df$x, function(i) i - df$x)),
ydiff = c(sapply(df$y, function(j) j - df$y)),
fromx = rep(df$x, dim(df)[1]),
fromy = rep(df$y, dim(df)[1]))
x y xdiff ydiff fromx fromy
1 159.5 143.5 0 0 159.5 143.5
2 159.5 143.5 2 -37 157.5 180.5
3 159.5 143.5 32 -16 127.5 159.5
4 159.5 143.5 -31 -55 190.5 198.5
5 159.5 143.5 44 28 115.5 115.5
6 159.5 143.5 -20 42 179.5 101.5
7 159.5 143.5 62 40 97.5 103.5
8 159.5 143.5 10 -254 149.5 397.5
9 157.5 180.5 -2 37 159.5 143.5
10 157.5 180.5 0 0 157.5 180.5
11 157.5 180.5 30 21 127.5 159.5
12 157.5 180.5 -33 -18 190.5 198.5
13 157.5 180.5 42 65 115.5 115.5
14 157.5 180.5 -22 79 179.5 101.5
15 157.5 180.5 60 77 97.5 103.5
16 157.5 180.5 8 -217 149.5 397.5
17 127.5 159.5 -32 16 159.5 143.5
18 127.5 159.5 -30 -21 157.5 180.5
19 127.5 159.5 0 0 127.5 159.5
20 127.5 159.5 -63 -39 190.5 198.5
21 127.5 159.5 12 44 115.5 115.5
22 127.5 159.5 -52 58 179.5 101.5
23 127.5 159.5 30 56 97.5 103.5
24 127.5 159.5 -22 -238 149.5 397.5
25 190.5 198.5 31 55 159.5 143.5
26 190.5 198.5 33 18 157.5 180.5
27 190.5 198.5 63 39 127.5 159.5
28 190.5 198.5 0 0 190.5 198.5
29 190.5 198.5 75 83 115.5 115.5
30 190.5 198.5 11 97 179.5 101.5
31 190.5 198.5 93 95 97.5 103.5
32 190.5 198.5 41 -199 149.5 397.5
33 115.5 115.5 -44 -28 159.5 143.5
34 115.5 115.5 -42 -65 157.5 180.5
35 115.5 115.5 -12 -44 127.5 159.5
36 115.5 115.5 -75 -83 190.5 198.5
37 115.5 115.5 0 0 115.5 115.5
38 115.5 115.5 -64 14 179.5 101.5
39 115.5 115.5 18 12 97.5 103.5
40 115.5 115.5 -34 -282 149.5 397.5
41 179.5 101.5 20 -42 159.5 143.5
42 179.5 101.5 22 -79 157.5 180.5
43 179.5 101.5 52 -58 127.5 159.5
44 179.5 101.5 -11 -97 190.5 198.5
45 179.5 101.5 64 -14 115.5 115.5
46 179.5 101.5 0 0 179.5 101.5
47 179.5 101.5 82 -2 97.5 103.5
48 179.5 101.5 30 -296 149.5 397.5
49 97.5 103.5 -62 -40 159.5 143.5
50 97.5 103.5 -60 -77 157.5 180.5
51 97.5 103.5 -30 -56 127.5 159.5
52 97.5 103.5 -93 -95 190.5 198.5
53 97.5 103.5 -18 -12 115.5 115.5
54 97.5 103.5 -82 2 179.5 101.5
55 97.5 103.5 0 0 97.5 103.5
56 97.5 103.5 -52 -294 149.5 397.5
57 149.5 397.5 -10 254 159.5 143.5
58 149.5 397.5 -8 217 157.5 180.5
59 149.5 397.5 22 238 127.5 159.5
60 149.5 397.5 -41 199 190.5 198.5
61 149.5 397.5 34 282 115.5 115.5
62 149.5 397.5 -30 296 179.5 101.5
63 149.5 397.5 52 294 97.5 103.5
64 149.5 397.5 0 0 149.5 397.5
如果愿意,可以考虑通过执行x == fromx
y == fromy
和df2[!c(df2$x == df2$fromx & df2$y == df2$fromy),]
的行。
答案 1 :(得分:1)
以下是outer
和expand.grid
构建的所有差异的集合:
cbind(cbind(with(dat, expand.grid(x=x,x=x)), xdiff=-c( with(dat, outer(x,x,"-") ))),
cbind( with(dat, expand.grid(y=y,y=y)), ydiff=-c( with(dat, outer(y,y,"-") ))))
#-----------
x x xdiff y y ydiff
1 159.5 159.5 0 143.5 143.5 0
2 157.5 159.5 2 180.5 143.5 -37
3 127.5 159.5 32 159.5 143.5 -16
4 190.5 159.5 -31 198.5 143.5 -55
5 115.5 159.5 44 115.5 143.5 28
6 179.5 159.5 -20 101.5 143.5 42
7 97.5 159.5 62 103.5 143.5 40
8 149.5 159.5 10 397.5 143.5 -254
9 159.5 157.5 -2 143.5 180.5 37
10 157.5 157.5 0 180.5 180.5 0
11 127.5 157.5 30 159.5 180.5 21
12 190.5 157.5 -33 198.5 180.5 -18
13 115.5 157.5 42 115.5 180.5 65
14 179.5 157.5 -22 101.5 180.5 79
#----snipped rest of 68 rows
答案 2 :(得分:0)
在其他地方找到了类似的解决方案。
"
答案 3 :(得分:-1)
这是不使用循环的一种可能的解决方案:
df <- data.frame(x = c(159.5, 157.5, 127.5, 190.5, 115.5, 179.5, 97.5, 149.5), y = c(143.5, 180.5, 159.5, 198.5, 115.5, 101.5, 103.5, 397.5) )
dx <- df$x[1:7] - df$x[2:8]
dy <- df$y[1:7] - df$y[2:8]
产生所需的差异:
> dx
[1] 2 30 -63 75 -64 82 -52
> dy
[1] -37 21 -39 83 14 -2 -294