Question

我是R的新手，我很难解决问题。谁能指出我哪里出错了我有以下数据*

   Score    TestID
   1536      2
   16000     18000
   1         15
   7         1800 
   738       256

有25000个测试ID，每个TestID都有一个相关的分数。在这种情况下，分数范围从0到16000.我需要绘制一个特定范围内存在的唯一TestID数量的图表，即

  ScoreRange      # of unique TestId
  0 - 16000              ?
  10 - 16000             ?
  20 - 16000             ?
     .                   . 
     .                   .

我编写了一个代码，其中我正在考虑步长为10（范围）并找出该范围内的唯一TestID。虽然我还没有进行策划，但我很难以上述格式获得输出。

    final <- matrix(0, ncol = 2, nrow = length(seq(1,max(Combined$Score),         
    10)))


     for (i in seq(1,max(df$Score), 10)) {
     comp <- subset(Combined, Score >= i)
     unik <- length(unique(comp$TestID))
     final[,c(1,2)] <- c(i,unik)

    }

我得到一个非常奇怪的输出结果，基本上是两个重复的值。我哪里错了？

Answer 1

编辑：你的问题是，当你将for循环的结果写入＆＃34; final＆＃34;矩阵，你没有指定矩阵的哪一行写结果。为了解决这个问题，我创建了一个＆＃34;计数器＆＃34;变量，并在for循环之前将其设置为0，然后为循环的每次迭代添加1。计数器指示要将结果写入矩阵的哪一行。试试这个：

final <- matrix(0, ncol = 2, nrow = length(seq(0,max(Combined$Score),         
    10)))
counter<-0
for (i in seq(0,max(df$Score), 10)) {
  counter<-counter+1
  comp <- subset(df, Score >= i)
  unik <- length(unique(comp$TestID))
  final[counter,c(1,2)] <- c(i,unik)

}

Answer 2

您可能考虑的另一个解决方案是使用hist()生成直方图。使用breaks参数可以提供您提到的步长的灵活性。无论如何，我创建了一个迷你版Combined作为例子：

Combined <- data.frame(Score = sample(0:16000,15),
                       TestID = sample(0:25000,15))
> Combined
   Score TestID
1  11456  14802
2  15854   1575
3  13738  22466
4   1659  14022
5  10671   4213
   ...    ...

Combined.hist <- hist(Combined$Score,
                      breaks = seq(0,16000))
final <- data.frame(Score = Combined.hist$breaks,
                    Counts = c(length(Combined[Combined$Score == 0]$Score),
                               Combined.hist$counts))

使用for循环创建子集（使用col数据框）并查找另一列的唯一值

2 个答案: