枚举data.table中的组内的组

时间:2019-06-21 12:02:42

标签: r data.table

这与多个重复项(123)有关,但是我遇到了一个稍有不同的问题。到目前为止,我只见过熊猫solution

在此数据表中:

<script>
$(document).ready(function(){
function make_chat_dialog_box(to_user_id, to_user_name)
{
  var modal_content = '<div id=user_dialog>..</div>';
  $('#user_model_details').append(modal_content);
  $(document).on("click", '.chat_message', function(e){
    e.preventDefault();
    var to_user_id = $(this).data('touserid');
    $('.popupbox').css("display", "block");
    })
  }
});
</script>

我想枚举每个组的唯一类以获取此信息:

dt = data.table(gr = rep(letters[1:2], each = 6), 
                cl = rep(letters[1:4], each = 3))

    gr cl
 1:  a  a
 2:  a  a
 3:  a  a
 4:  a  b
 5:  a  b
 6:  a  b
 7:  b  c
 8:  b  c
 9:  b  c
10:  b  d
11:  b  d
12:  b  d

3 个答案:

答案 0 :(得分:3)

您可以(可能需要先对数据进行排序):

dt[, id := cumsum(!duplicated(cl)), by = gr]

    gr cl id
 1:  a  a  1
 2:  a  a  1
 3:  a  a  1
 4:  a  b  2
 5:  a  b  2
 6:  a  b  2
 7:  b  c  1
 8:  b  c  1
 9:  b  c  1
10:  b  d  2
11:  b  d  2
12:  b  d  2

dplyr相同:

dt %>%
 group_by(gr) %>%
 mutate(id = cumsum(!duplicated(cl)))

或类似rleid()的可能性:

dt %>%
 group_by(gr) %>%
 mutate(id = with(rle(cl), rep(seq_along(lengths), lengths)))

答案 1 :(得分:3)

尝试

library(data.table)
dt[, id := rleid(cl), by=gr]
dt
#    gr cl id
# 1:  a  a  1
# 2:  a  a  1
# 3:  a  a  1
# 4:  a  b  2
# 5:  a  b  2
# 6:  a  b  2
# 7:  b  c  1
# 8:  b  c  1
# 9:  b  c  1
#10:  b  d  2
#11:  b  d  2
#12:  b  d  2

答案 2 :(得分:0)

使用factor的替代解决方案,无需先订购

dt %>%
  group_by(gr) %>%
  mutate(id = as.numeric(factor(cl))) %>%
  ungroup()

# # A tibble: 12 x 3
#   gr    cl       id
#   <chr> <chr> <dbl>
# 1 a     a         1
# 2 a     a         1
# 3 a     a         1
# 4 a     b         2
# 5 a     b         2
# 6 a     b         2
# 7 b     c         1
# 8 b     c         1
# 9 b     c         1
#10 b     d         2
#11 b     d         2
#12 b     d         2

请注意,这将根据cl组中每个gr值的字母顺序自动分配一个数字/ id。