R data.table:如何"标签"列中的连续值?

时间:2017-04-24 20:05:02

标签: r dataframe data.table

我有以下data.table(如果你将它用作data.frame就可以了)

 [...]

<p:inputText id="textfeld"></p:inputText>

<h:form id="myForm">
    <h:inputHidden value="#{managedBean.x}" id="x" />
    <h:inputHidden value="#{managedBean.y}" id="y" />
    <p:commandButton value="Execute" actionListener="#{managedBean.speichern}" />
</h:form>

<h:outputScript library="js" name="script.js"></h:outputScript>

[...]

我想:

(1)创建一个列,如果值为&lt; = 1000

,则为1

(2)然后将这些唯一分组编号为1&#39;

结果data.table如下所示:

library(data.table)

dt <- data.table(first_column = c("item1", "item2", "item3", "item4", "item5", "item6", "item7"),
second_column = c("cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2"), third_column = c(50, 10, 18, 3092, 731, 189, 1991))

> dt
   first_column second_column third_column
1:        item1          cat1           50
2:        item2          cat1           10
3:        item3          cat1           18
4:        item4          cat2         3092
5:        item5          cat2          731
6:        item6          cat2          189
7:        item7          cat2         1991

这将创建一个全零和一列的列:

> dt

  first_column second_column  third_column  labels
0        item1          cat1            50     1
1        item2          cat1            10     1
2        item3          cat1            18     1
3        item4          cat2          3092     0
4        item5          cat2           731     2
5        item6          cat2           189     2
6        item7          cat2          1991     0

我如何标记这些&#34;分组&#34; 1s?

1 个答案:

答案 0 :(得分:3)

我们按'second_column分组,在'i'中指定逻辑条件(third_column <= 1000),将'标签'指定为:=作为.GRP,然后将NA值替换为0在下一步

dt[third_column<=1000, labels := .GRP , second_column][is.na(labels), labels :=0][]
#     first_column second_column third_column labels
#1:        item1          cat1           50      1
#2:        item2          cat1           10      1
#3:        item3          cat1           18      1
#4:        item4          cat2         3092      0
#5:        item5          cat2          731      2
#6:        item6          cat2          189      2
#7:        item7          cat2         1991      0

或者通过获取逻辑向量(!duplicated(second_column))的累积和并将其与另一个逻辑向量(third_column <= 1000)相乘,第二个选项更紧凑

dt[, labels := cumsum(!duplicated(second_column))*(third_column <= 1000)]
dt
#    first_column second_column third_column labels
#1:        item1          cat1           50      1
#2:        item2          cat1           10      1
#3:        item3          cat1           18      1
#4:        item4          cat2         3092      0
#5:        item5          cat2          731      2
#6:        item6          cat2          189      2
#7:        item7          cat2         1991      0