使用异构类型初始化data.table

时间:2015-09-22 21:38:38

标签: r data.table

我需要构建一个大的data.table,其中每一行都是用户,列是不同类型的属性。我需要逐行填写表格。我应该如何初始化它?

例如,如果我这样做:

dt.hetero <- data.table(matrix(-1, nrow=3, ncol=6))
names(dt.hetero) <- c("name", "lastname", "city", "age", "weight", "heigh")
dt.hetero[1, age:=34]
dt.hetero[1, name:="alice"]

它期望到处都是双打,因此当我尝试输入字符串时会收到警告:

Warning messages:
1: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
  NAs introduced by coercion
2: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
  Coerced 'character' RHS to 'double' to match the column's type. Either change the target column to 'character' first (by creating a new 'character' vector length 3 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'double' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
dt.hetero[1, name:="alice"]

修改

我按顺序获取用户数据。因此,该过程是

  每个用户

     
      
  • 获取用户数据
  •   
  • 将用户数据复制到data.table
  • 中的行   
     

return data.table

3 个答案:

答案 0 :(得分:5)

您可以在创建空数据时直接指定每列的类型。表格:

dt.hetero <- data.table(name = character(3L), 
                        lastname = character(3L), 
                        city = character(3L), 
                        age = integer(3L), 
                        weight = double(3L), 
                        height = double(3L))

您可以更改号码&#34; 3&#34;按你实际想要的行数。

答案 1 :(得分:3)

  

我需要逐行填写表格。

如果您是手动操作,请考虑......

res <- fread("
  name              age        weight
  Bob               101        111
  Alice             33         77     ")

...或

rows <- list(
  list(name = "Bob"    , age = 101, weight = 111 ),
  list(name = "Alice"  , age = 33 , weight = 77  ) 
)

res2 <- rbindlist(rows)

如果要按顺序采集数据,也可以使用第二种方法:

rows <- vector("list",3)

rows[[1]] <- list(name = "Bob"    , age = 101, weight = 111 )
rows[[2]] <- list(name = "Alice"  , age = 33 , weight = 77  ) 
rows[[3]] <- list(name = "Cadmus" , age = 44 , weight = 55  ) 

res2 <- rbindlist(rows)

显然,这也适用于循环:

for (i in seq_along(rows)){
  # ... do_stuff to find row info ...
  rows[[i]] <- # put row info here
}
res2 <- rbindlist(rows)

答案 2 :(得分:0)

这是在R中工作的一种非常缓慢的方式 - 参见R Inferno的"Second Circle"。您可以更有效地进行矢量化&#39;过程:

users = c('John','Jill','James')
ages = c(25,53,37)

# of course there is: data.frame(user = users, age=ages), but assuming that's
# not possible in this case..

users_list <- lapply(1:3, FUN=function(i){
  return(data.frame(user = users[i],
                    age = ages[i]))
})

do.call('rbind', users_list)
user age
1  John  25
2  Jill  53
3 James  37