从表中提取列的快速方法,并在R中为它们提供特定格式

时间:2016-06-29 15:20:02

标签: r dataframe data.table

我有下表:

          date id rank_year sales cost
 1: 2015-12-31  1         0   149  101
 2: 2014-12-31  1        -1   150  102
 3: 2013-12-31  1        -2   151  104
 4: 2012-12-31  1        -3   152  107
 5: 2011-12-31  1        -4   155   99
 6: 2015-12-31  2         0    84   55
 7: 2014-12-31  2        -2    83   55
 8: 2014-01-25  2        -3    80   56
 9: 2013-01-25  2        -4    81   57
10: 2012-01-25  2        -5    97   58

library(data.table)
DT <- data.table(as.IDate(c("2015-12-31", "2014-12-31", "2013-12-31",
                             "2012-12-31", "2011-12-31", "2015-12-31",
                             "2014-12-31", "2014-01-25", "2013-01-25",
                             "2012-01-25")),
                  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                  c("0", "-1", "-2", "-3", "-4", "0", "-2", "-3", "-4", "-5"),
                  c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
                  c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58))
setnames(DT, c("date", "id", "rank_year", "sales", "cost"))

我必须隔离此data.table的每一列,以便将值放在另一种格式中。我的环境中应该有一个新变量,其中sales作为名称,具有以下格式:

   id -19 -18 -17 -16 -15 -14 -13 -12 -11 -10  -9  -8  -7  -6  -5  -4  -3  -2  -1   0   1
1:  1 @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA 155 152 151 150 149 @NA
2:  2 @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA  97  81  80  83 @NA  84 @NA

我将按以下步骤定义我必须执行的步骤:

  1. 为DT中的每一列创建新变量,并为其指定列的名称

  2. 新变量的行应为unique(DT$id)

  3. 列应始终为-19到1.它代表不同的rank_year

  4. 在每个新变量中使用rank_year添加好的列中的DT值(查找)

  5. 我做了以下代码。它有效,但它太慢了。在真实数据集上运行代码需要3天以上的时间。

    unique.id <- data.frame(unique(DT$id))
    variable.names <- colnames(DT)
    
    rank <- data.frame((-19:1))
    colnames(rank) <- "rank_year"
    
    col.names <- factor(rank[[1]])  
    row.names <- unique(DT$id)
    
    for (i in 1:length(variable.names)) {
    
      variable.results <- matrix(data = NA,
                                 nrow = dim(unique.id)[1],
                                 ncol = dim(rank)[1])
      colnames(variable.results) <- col.names
      row.names(variable.results) <- row.names
    
      for (j in 1:length(row.names)) {
        temp.data <- DT[DT$id == row.names[j], ]
        temp.data <- data.frame(temp.data)
        temp.data <- data.frame(temp.data["rank_year"], temp.data[i]) 
        temp.data <- merge(rank, temp.data, by = "rank_year", all.x = TRUE)
        variable.results[j, ] <- t(data.frame(temp.data[, 2]))
        variable.results[is.na(variable.results)] <- "@NA"
      }
      rm(temp.data)
    
      assign(variable.names[i], variable.results)
    }
    

1 个答案:

答案 0 :(得分:1)

这是一个经典的#include <stdio.h> #include "tinyxml2.h" #include <iostream> #include <string> #include <iomanip> using namespace tinyxml2; using namespace std; int main(){ tinyxml2::XMLError eResult = xml_doc.LoadFile("test.xml"); if (eResult != tinyxml2::XML_SUCCESS) return false; tinyxml2::XMLNode* root = xml_doc.FirstChildElement("root"); if (root == nullptr) return false; tinyxml2::XMLElement* First = root->FirstChildElement("First"); if (First == nullptr) return false; double x1 = std::stod(First->Attribute("x")); double y1 = std::stod(First->Attribute("y")); tinyxml2::XMLElement* Second = root->FirstChildElement("Second"); if (Second == nullptr) return false; double x2 = std::stod(Second->Attribute("x")); double y2 = std::stod(Second->Attribute("y")); system("pause"); } 操作,您需要填充原始数据,这可以通过合并完成:

dcast

请注意,排序是按字母顺序排列的,因为dcast( DT[CJ(rank_year = as.character(-19:1), id = id, unique = TRUE), on = c("rank_year", "id")], id ~ rank_year, value.var = "sales") # id -1 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -2 -3 -4 -5 -6 -7 -8 -9 0 1 #1: 1 150 NA NA NA NA NA NA NA NA NA NA 151 152 155 NA NA NA NA NA 149 NA #2: 2 NA NA NA NA NA NA NA NA NA NA NA 83 80 81 97 NA NA NA NA 84 NA 是一个字符向量。如果需要数字排序,请将其设为整数。或者您可以在之后订购列。此外,您不应将rank_year值定义为字符。