Question

假设我有一个学生数据集，报告他们在夏天阅读的第一本，最后一本和最喜欢的书...

d <- data.table(
      student_id = 1:4,
      first_book = c('Dune','Starship Troopers', 
                     'The Moon is a Harsh Mistress', NA_character_),
      last_book  = c('The Martian Chronicles','Foundation',
                     'The Moon is a Harsh Mistress', NA_character_),
      favorite_book = c('I, Robot','Foundation',
                        'The Moon is a Harsh Mistress', NA_character_) )

...我想知道每个学生有多少本报告包含的独特书籍。（我想知道，Student1读了三本以上的书，Student2读了两本以上的书，Student3恰好一本书，Student4读了零。）

我知道我可以这样使用melt和length(unique()：

melt(d, id.vars='student_id', 
     value.name='title')[, .(count = length(unique(title[!is.na(title)]))),
                          keyby=student_id]

但是没有必要融化，我可以这样使用apply和length(unique())：

d[, apply(.SD, 1, function(x) length(unique(x[!is.na(x)]))), 
  .SDcols = c('first_book', 'last_book', 'favorite_book'),
  keyby = student_id] # or by .I if one row per student

apply是首选的惯用data.table方法，还是有更好的方法来写这个？

在所选的data.table列中按行计算唯一值

0 个答案: