Question

我的完整数据（dput()的结果）在问题的最后。我正在尝试使用ggplot()创建一个瓦片图，并且测量间距不均匀x和y，因此图块不会填满整个区域。这是一个例子：

library(ggplot2)
ggplot(data, aes(x = x, y = -y, z = d)) + geom_tile(aes(fill = d))

unevenly space tiles

我不确定，但我认为ggplot可能默认为类似unique(data$x)[2] - unique(data$x)[1]的图块大小，因此我的数据行确实是连续之间的距离x或y测量值会触及，但其余部分不会触及。我想我会使用height和width为我的数据制作plyr和ddply()列，但我遇到了奇怪的结果。

对于那些不打算加载完整数据的人，这里是结构：

head(data, 5)

     x y       d
1  2.0 0 0.28125
2  5.5 0 0.81250
3 11.5 0 0.56250
4 17.5 0 0.46875
5 23.5 0 0.40625

tail(data, 5)

       x    y     d
191 47.5 80.5 0.000
192 53.5 80.5 0.125
193 59.5 80.5 0.000
194 65.5 80.5 0.000
195 71.0 80.5 0.000

因此，我为x的每个唯一值循环浏览y的每个值。以下是我尝试设置高度/宽度列的方法：

# for each unique value of y, calculate diff for the x's and then add on 1
data$width <- ddply(data, .(y), summarize, width = c(diff(x), 1))$width

# for each unique value of x, calculate diff for the y's and then add on 1
data$height <- ddply(data, .(x), summarize, height = c(diff(y), 1))$height

由于1 diff()值的n长度为n-1，我刚刚结束了ggplot(data, aes(x = x, y = -y, z = d)) + geom_tile(aes(fill = d, height = height, width = width))，我认为我玩的是正确值后来连接。不过这是我得到的：

head(data, 5)

      x y       d height width
1   2.0 0 0.28125    5.5   3.5
2   5.5 0 0.81250    6.5   6.0
3  11.5 0 0.56250    6.0   6.0
4  17.5 0 0.46875    6.0   6.0
5  23.5 0 0.40625    6.0   6.0

wrong heights

宽度是正确的，但不是高度。经调查：

因此，我们可以看到宽度是正确的：2 - ＆gt; 5.5 = 3.5,5.5 - ＆gt; 11.5 = 6，依此类推。

但是高度不是，如果我们只查看常量head(data[data$x == 2, ], 5) x y d height width 1 2 0.0 0.28125 5.5 3.5 14 2 5.5 0.37500 4.5 3.5 27 2 12.0 0.37500 4.5 3.5 40 2 18.0 0.56250 6.0 3.5 53 2 24.0 0.25000 6.0 3.5值的输出，我们可以看到：

ddply

第一个应该是5.5（正确），但第二个应该是6.5，然后是6，依此类推。

如果我通过自我子集手动运行我的c(diff(data[data$x == 2, "y"]), 1) [1] 5.5 6.5 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 4.5 5.5 4.5 1.0功能，它似乎有效：

height

在重新检查x值时，它们似乎相同，但重新排列。根据这一观察结果，我重新对数据进行了重新排序，好像我在保持y不变的情况下为每个唯一的height收集了数据，而不是相反，然后重新定义了我的{{1 }和width列：

data_sort <- data[order(data$y, data$x), c("x", "y", "d")]
data_sort$width <- ddply(data_sort, .(y), summarize, width = c(diff(x), 1))$width
data_sort$height <- ddply(data_sort, .(x), summarize, height = c(diff(y), 1))$height

高度现在是正确的，但宽度是混乱的：

head(data_sort, 5)
   x    y       d width height
1  2  0.0 0.28125   3.5    5.5
14 2  5.5 0.37500   6.0    6.5
27 2 12.0 0.37500   6.0    6.0
40 2 18.0 0.56250   6.0    6.0
53 2 24.0 0.25000   6.0    6.0
66 2 30.0 0.31250   6.0    6.0

我错过了什么ddply在搜索唯一但非连续的级别/值时没有保持秩序？

数据：

dput(data)
structure(list(x = c(2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 
47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 
41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 
35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 
29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 
23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 
17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 
5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 
71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 
65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 
59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 47.5, 
53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 41.5, 
47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 35.5, 
41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 29.5, 
35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 23.5, 
29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71, 2, 5.5, 11.5, 17.5, 
23.5, 29.5, 35.5, 41.5, 47.5, 53.5, 59.5, 65.5, 71), y = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.5, 5.5, 5.5, 5.5, 5.5, 
5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 12, 12, 12, 12, 12, 12, 
12, 12, 12, 12, 12, 12, 12, 18, 18, 18, 18, 18, 18, 18, 18, 18, 
18, 18, 18, 18, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 
24, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 36, 36, 
36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 42, 42, 42, 42, 42, 
42, 42, 42, 42, 42, 42, 42, 42, 48, 48, 48, 48, 48, 48, 48, 48, 
48, 48, 48, 48, 48, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 
54, 54, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 66, 
66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 70.5, 70.5, 70.5, 
70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 70.5, 76, 
76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 80.5, 80.5, 80.5, 
80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5), 
    d = c(0.28125, 0.8125, 0.5625, 0.46875, 0.40625, 0.3125, 
    0.25, 0.125, 0.09375, 0.0625, 0.1875, 0.25, 0, 0.375, 0.46875, 
    0.5, 0.4375, 0.4375, 0.3125, 0.28125, 0.1875, 0.125, 0.0625, 
    0.1875, 0.3125, 0.5, 0.375, 0.25, 0.375, 0.4375, 0.375, 0.3125, 
    0.28125, 0.15625, 0.125, 0.0625, 0.1875, 0.3125, 0.5, 0.5625, 
    0.375, 0.4375, 0.40625, 0.375, 0.3125, 0.25, 0.15625, 0.09375, 
    0.0625, 0.125, 0.28125, 0.3125, 0.25, 0.34375, 0.40625, 0.40625, 
    0.375, 0.3125, 0.21875, 0.125, 0.09375, 0.0625, 0.125, 0.25, 
    0.3125, 0.3125, 0.375, 0.40625, 0.40625, 0.375, 0.3125, 0.21875, 
    0.09375, 0.0625, 0, 0.09375, 0.15625, 0.25, 0.28125, 0.34375, 
    0.40625, 0.4375, 0.4375, 0.375, 0.3125, 0.1875, 0.15625, 
    0.0625, 0.125, 0.25, 0.3125, 0.3125, 0.375, 0.4375, 0.46875, 
    0.46875, 0.4375, 0.375, 0.28125, 0.5625, 0.0625, 0.125, 0.25, 
    0.34375, 0.3125, 0.4375, 0.4375, 0.5, 0.5, 0.5, 0.4375, 0.34375, 
    0.21875, 0.0625, 0.125, 0.25, 0.34375, 0.3125, 0.4375, 0.4375, 
    0.46875, 0.5, 0.5, 0.4375, 0.34375, 0.21875, 0.09375, 0.15625, 
    0.3125, 0.34375, 0.25, 0.34375, 0.34375, 0.375, 0.375, 0.6875, 
    0.3125, 0.1875, 0.125, 0.0625, 0.125, 0.25, 0.3125, 0.125, 
    0.21875, 0.28125, 0.28125, 0.25, 0.25, 0.1875, 0.09375, 0.0625, 
    0.0625, 0.1875, 0.3125, 0.4375, 0, 0.125, 0.1875, 0.1875, 
    0.21875, 0.1875, 0.1875, 0.28125, 0.15625, 0.125, 0.125, 
    0.375, 0.625, 0, 0.0625, 0.09375, 0.09375, 0.21875, 0.21875, 
    0.21875, 0.21875, 0.1875, 0.15625, 0.4375, 0.625, 0, 0, 0, 
    0, 0.09375, 0.125, 0.125, 0.09375, 0.0625, 0, 0.125, 0, 0, 
    0)), .Names = c("x", "y", "d"), row.names = c(NA, -195L), class = "data.frame")

Answer 1

傻，傻，傻。

ddply的输出重新排列了它处理它们的顺序，当我只提取height列的输出时，我完全忽略了（忘记/无知）这个事实。所以，即使我的数据首先由y和x分类，当我调用ddply来计算基于唯一x的内容和/那么/ y，就是它提供输出的方式。

只是为了证明这一点：

head(data)
     x y       d
1  2.0 0 0.28125
2  5.5 0 0.40625
3 11.5 0 0.56250
4 17.5 0 0.46875
5 23.5 0 0.40625
6 29.5 0 0.31250

查看我的ddply来电的完整输出表明，y的分组只是它们在原始数据中的显示方式，因此cbind将该列作为data$width {1}}工作正常：

widths <- ddply(data, .(y), summarize, width = c(diff(x), 1))
head(widths)
  y width
1 0   3.5
2 0   6.0
3 0   6.0
4 0   6.0
5 0   6.0
6 0   6.0

但是当我为高峰做到这一点时，数据按唯一x分组，这不是我的数据排列方式：

heights <- ddply(data, .(x), summarize, height = c(diff(y), 1))
head(heights)
  x height
1 2    5.5
2 2    6.5
3 2    6.0
4 2    6.0
5 2    6.0
6 2    6.0

当然不能保证一个问题 - 通过只提取我想要的列，我完全忽略了与我的数据相比ddply输出的形式。

为了解决这个问题，我可能应该创建两个数据框，同时包含x和y值以及height和width（从{{1开始计算） }}），然后通过diff()和x的唯一组合合并它们。

ddply具有不同的输出，具体取决于用于应用函数的.variables的排序/顺序

1 个答案: