Question

我需要构建一个大的data.table，其中每一行都是用户，列是不同类型的属性。我需要逐行填写表格。我应该如何初始化它？

例如，如果我这样做：

dt.hetero <- data.table(matrix(-1, nrow=3, ncol=6))
names(dt.hetero) <- c("name", "lastname", "city", "age", "weight", "heigh")
dt.hetero[1, age:=34]
dt.hetero[1, name:="alice"]

它期望到处都是双打，因此当我尝试输入字符串时会收到警告：

Warning messages:
1: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
  NAs introduced by coercion
2: In `[.data.table`(dt.hetero, 1, `:=`(name, "alice")) :
  Coerced 'character' RHS to 'double' to match the column's type. Either change the target column to 'character' first (by creating a new 'character' vector length 3 (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to 'double' (e.g. 1L, NA_[real|integer]_, as.*, etc) to make your intent clear and for speed. Or, set the column type correctly up front when you create the table and stick to it, please.
dt.hetero[1, name:="alice"]

修改

我按顺序获取用户数据。因此，该过程是

  每个用户
：


获取用户数据

将用户数据复制到data.table
中的行

return data.table

Answer 1

您可以在创建空数据时直接指定每列的类型。表格：

dt.hetero <- data.table(name = character(3L), 
                        lastname = character(3L), 
                        city = character(3L), 
                        age = integer(3L), 
                        weight = double(3L), 
                        height = double(3L))

您可以更改号码＆＃34; 3＆＃34;按你实际想要的行数。

Answer 2

我需要逐行填写表格。

如果您是手动操作，请考虑......

res <- fread("
  name              age        weight
  Bob               101        111
  Alice             33         77     ")

...或

rows <- list(
  list(name = "Bob"    , age = 101, weight = 111 ),
  list(name = "Alice"  , age = 33 , weight = 77  ) 
)

res2 <- rbindlist(rows)

如果要按顺序采集数据，也可以使用第二种方法：

rows <- vector("list",3)

rows[[1]] <- list(name = "Bob"    , age = 101, weight = 111 )
rows[[2]] <- list(name = "Alice"  , age = 33 , weight = 77  ) 
rows[[3]] <- list(name = "Cadmus" , age = 44 , weight = 55  ) 

res2 <- rbindlist(rows)

显然，这也适用于循环：

for (i in seq_along(rows)){
  # ... do_stuff to find row info ...
  rows[[i]] <- # put row info here
}
res2 <- rbindlist(rows)

Answer 3

这是在R中工作的一种非常缓慢的方式 - 参见R Inferno的"Second Circle"。您可以更有效地进行矢量化＆＃39;过程：

users = c('John','Jill','James')
ages = c(25,53,37)

# of course there is: data.frame(user = users, age=ages), but assuming that's
# not possible in this case..

users_list <- lapply(1:3, FUN=function(i){
  return(data.frame(user = users[i],
                    age = ages[i]))
})

do.call('rbind', users_list)
user age
1  John  25
2  Jill  53
3 James  37

使用异构类型初始化data.table

3 个答案: