Question

我有一个这样的数据框：

df <- data.frame(
    Dim1 = c("A","A","A","A","A","A","B","B"),
    Dim2 = c(100,100,100,100,200,200,100,200),
    Value = sample(1:10, 8)
        )

  Dim1 Dim2 Value
1    A  100     3
2    A  100     6
3    A  100     7
4    A  100     4
5    A  200     8
6    A  200     9
7    B  100     2
8    B  200    10

（“值”列只是为了说明每一行都是一个数据点;实际值无关紧要。）最后我想要做的是将值与其在中的索引绘制在Dim1和Dim2定义的子集中。出于这个原因，我认为需要附加一个包含索引的新列，它们看起来像这样（在行之间添加空行以明确子集是什么）：

Dim1 Dim2 Value Index 1 A 100 1 1 2 A 100 9 2 3 A 100 4 3 4 A 100 10 4 5 A 200 7 1 6 A 200 3 2 7 B 100 5 1 8 B 200 8 1

我如何在R中优雅地做到这一点？我来自Python，我的默认方法是循环使用Dim1和amp;的组合。 Dim2，跟踪每个行中的行数，并将到目前为止遇到的最大值分配给每一行。我一直想弄明白，但我的矢量很弱。

Answer 1

这可能看起来像作弊，因为我将一个向量传递给一个函数，然后我完全忽略它除了得到它的长度：

 df$Index <- ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=function(x) 1:length(x) )

ave函数返回与第一个参数长度相同的向量，但在第一个参数和名为FUN的参数之间的所有因子定义的类别中计算。（我常常忘记将“FUN =”放入我的函数中，并在unique() applies only to vectors的行中得到一个神秘的错误消息，因为它试图确定一个匿名函数拥有多少个唯一值并且它失败了。 / p>

实际上还有另一种使用function(x) 1:length(x)函数来表达seq_along的更紧凑的方法，因为如果传递长度为零的向量会导致正确失败，而且匿名函数形式将通过不正确的方式失败，那么它可能会更加安全返回1:0而不是numeric(0)：

ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=seq_along )

Answer 2

在这里，您可以使用data.table：

library(data.table)
df <- data.table(
    Dim1 = c("A","A","A","A","A","A","B","B"),
    Dim2 = c(100,100,100,100,200,200,100,200),
    Value = sample(1:10, 8)
        )

df[, index := seq_len(.N), by = list(Dim1, Dim2)]

Answer 3

这是你想要实现的目标吗？

library(ggplot2)
df <- data.frame(
  Dim1 = c("A","A","A","A","A","A","B","B"),
  Dim2 = c(100,100,100,100,200,200,100,200),
  Value = sample(1:10, 8)
)
df$index <- c(1,2,3,4,1,2,1,1)

ggplot(df,aes(x=index,y=Value))+geom_point()+facet_wrap(Dim1~Dim2)

输出如下： enter image description here

r - 如何基于因子组合将行索引添加到数据框

3 个答案: