Question

所以，我有一个小型数据库，我在RStudio工作，我有这种情况，我需要根据他们不及的课程对一些学生进行分类。

for (i in 1:(nrow(BDbruno2)-1))
{
  #avoiding exploitations, i tested without it but my problem still continues
  if(i >= nrow(BDbruno2))
  {
    break 
  }else 
  {
    #checking is the codes are still the same
    if((BDbruno2$Code[i] == BDbruno2$Code[i+1]) && (i < nrow(BDbruno2)))
    {
      auxIndice <- BDbruno2$nLinha[i]
      auxTurmas <- BDbruno2$tempo[i]

      for(j in (i+1):nrow(BDbruno2))
      {
        #checking if codes are the same, if FALSE, i save all classes and save in a string for all codes
        if(BDbruno2$Code[j-1] != BDbruno2$Code[j]){
          BDbruno2$turmasCalc1[auxIndice] <- paste0(auxTurmas, collapse = " ")
          #skipping the same codes that i already checked
          i <- j
          #i tested without this break, only makes my code to take longer to finish
          break
        } else
        {
          #saving all rows where the code is the same
          auxIndice <- c(auxIndice, BDbruno2$nLinha[j])
          #this line below is where i get my problem:
          #it receives the classes from the same code, but when going further in the loop, this var gets messed
          auxTurmas <- c(auxTurmas, BDbruno2$tempo[j])
        }
      }
    }
  }
}

auxTurmas var不会返回预期的结果，但是当我逐行运行代码时会这样做。

有这四种情况，我得到同一个学生（不同的行），我将他的所有课程保存在一个新的var（turmasCalc1）中，这是1 4 7 8，他的所有行都得到了论文数字，但在for循环中，第一个是正确的，第二个失败（4 7 8），另外两个也失败（7 8）。

奇怪的是，如果我从i = nLinha in 1:46运行它，它可以正常工作，但我需要它来处理所有情况（这种情况很多）。我不确定，但它似乎不是我使用的break的问题，但我无法看到是什么让这件奇怪的事情发生。有人能给我一个亮点吗？

编辑：很抱歉由于缺少信息，以下是数据框的示例，它应该在行1 4 7 8上返回nLinha = 46:49，其他时间在同一数据上会出现类似问题帧。

Code             Calculo1_Turma          tempo   turmasCalc1  nLinha
1635340632       2014/1 - MAT154-B          11         11     45
1638717605       2009/1 - MAT154-E          1     1 4 7 8     46
1638717605       2010/3 - MAT154-I          4       4 7 8     47
1638717605       2012/1 - MAT154-A          7         7 8     48
1638717605       2012/3 - MAT154-D          8         7 8     49
1643222643       2011/1 - MAT154-D          5         5 6     50
1643222643       2011/3 - MAT154-B          6         5 6     51
1645485641       2009/1 - MAT154-B          1           1     52

这就是我想要的：

Code             Calculo1_Turma          tempo   turmasCalc1  nLinha
1635340632       2014/1 - MAT154-B          11         11     45
1638717605       2009/1 - MAT154-E          1     1 4 7 8     46
1638717605       2010/3 - MAT154-I          4     1 4 7 8     47
1638717605       2012/1 - MAT154-A          7     1 4 7 8     48
1638717605       2012/3 - MAT154-D          8     1 4 7 8     49
1643222643       2011/1 - MAT154-D          5         5 6     50
1643222643       2011/3 - MAT154-B          6         5 6     51
1645485641       2009/1 - MAT154-B          1           1     52

Edit2：对不起再次抱歉。这是dpyr生成的代码：

BDbruno2 <- structure(list(Code = c("1634171640", "1634171640", "1634171640", "1635340632", "1638717605", "1638717605", "1638717605", "1638717605", "1643222643", "1643222643", "1645485641"), Calculo1_Turma = c("2009/1 - MAT154-D", "2009/3 - MAT154-A", "2010/3 - MAT154-I", "2014/1 - MAT154-B", "2009/1 - MAT154-E", "2010/3 - MAT154-I", "2012/1 - MAT154-A", "2012/3 - MAT154-D", "2011/1 - MAT154-D", "2011/3 - MAT154-B", "2009/1 - MAT154-B"), tempo = c(1, 2, 4, 11, 1, 4, 7, 8, 5, 6, 1), turmasCalc1 = c("1", "2", "4", "11", "1 4 7 8", "4 7 8", "7 8", "7 8", "5 6", "5 6", "1"), nLinha = 42:52), .Names = c("Code", "Calculo1_Turma", "tempo", "turmasCalc1", "nLinha"), row.names = c(162L, 305L, 714L, 3880L, 210L, 715L, 887L, 924L, 2157L, 2446L, 60L), class = "data.frame")

这个下面会产生一些行，这些行正常工作。回顾一下：我应该1 4 7 8 turmasCalc1 nLinha in 46:49 public enum Card { TWO(2, "2"), THREE(3, "3"), FOUR(4, "4"), FIVE(5, "5"), SIX(6, "6"), SEVEN(7, "7"), EIGHT(8, "8"), NINE(9, "9"), TEN(10, "10"), QUEEN(11, "Queen"), JACK(12, "Jack"), KING(13, "King"), ACE(14, "Ace"); private final int value; private final String name; private static final Card[] VALUES = values(); private static final int SIZE = VALUES.length; private static final Random RANDOM = new Random(); Card(int value, String name) { this.value = value; this.name = name; } public static Card getRandomCard() { return VALUES[RANDOM.nextInt(SIZE)]; } public int getValue() { return value; } public String getName() { return name; } } public static void main(String[] args) { System.out.println(Card.getRandomCard().getName()); }，但i索引似乎存在问题。当相同的代码出现3次或更多次时，会发生此问题，而不仅仅是4。

Answer 1

我打算开始这个。我可能不会让你的输出正确（因为我真的不知道你想要什么） - 但我希望这能让你开始朝着正确的方向前进。

library(dplyr)
BDbruno2 %>%
  group_by(CODIGO) %>%
  summarize(turmasCalc1 = paste(tempo, collapse = " ")) %>%
  left_join(select(BDbruno2, -turmasCalc1), .)

这是解决问题的R风格解决方案。 R是一种解释型语言 - 它非常适合在运行中频繁更改内容，但对于繁重的计算来说相当慢。解决方案是R中已知的矢量化函数 - 这意味着繁重的计算内容和循环是用编译语言实现的，而R只是传递数据。这意味着写入循环IN R几乎总是一个坏主意 - 我们想要调用函数的矢量化版本，它将为我们做循环。这是我刚才用一些例子说的更长版本：http://alyssafrazee.com/2014/01/29/vectorization.html

最近，另一场革命在R中发生 - 这表现在一套名为tidyverse的套件中，其中一个我在上面使用过 - dplyr。整套软件包都很棒，但最重要的是使用管道（％＆gt;％）。他们所做的只是获取前一个函数的结果，并将其设置为下一个函数的第一个参数 - 但它们允许我们线性化函数调用，并查看正在发生的事情。

使用上面的代码 - 我首先按CODIGO进行分组（我假设它与您提供的for循环中的Code相同）。不再需要查看代码是否相同，我们正在查看块中的数据，并且代码对于块中的所有内容都是相同的。下一个函数是summarize - 它表示我们要为每个代码生成一个摘要，我们将通过粘贴tempo的元素来获取它。

最后，我们将使用left_join将其重新合并到原始数据集中。在这里，我想让turmasCalc1成为最后一个变量而不是第一个变量，所以我希望原始数据集（DBbruno2成为第一个变量）。这就是为什么在列之后使用单个点调用它 - 我将覆盖作为第一个参数输入结果的默认行为，并将其作为第二个参数。

R：＆＃39;如果＆＃39;在这个内部条件不能正常工作＆＃39;环

1 个答案: