R数据表。需要使用分组和复杂的条件过滤来修改列

时间:2018-12-18 09:25:18

标签: r if-statement data.table filtering grouping

这是数据样本:

zz <- "
id  Sub_Segment1    Sub_Segment2    Sub_Segment3    Sub_Segment4    Sub_Segment5
1   x   x1  r   y1  z1
1   x   x1  r   y1  z1
1   x   x1  r   y1  z1
1   x   x1  r   y1  z1
1   x   x1  r   y1  z1
1   x   x1  r   y1  z1
1   x   x1  r   y1  z1
2   y   x2  r   y2  z1
2   y   x2  r   y2  z1
2   y   x2  r   y2  z1
2   y   x2  r   y2  z1
2   y   x2  r   y2  z1
"

Data <- read.table(text=zz, header = TRUE)
setDT(Data)

如果我将修改应用于整个表,它将返回NA:

Data[(length(unique(Sub_Segment1[Sub_Segment1!=""]))==1),name:="test" , by=id ]

返回:

 id Sub_Segment1 Sub_Segment2 Sub_Segment3 Sub_Segment4 Sub_Segment5 name
 1:  1            x           x1            r           y1           z1   NA
 2:  1            x           x1            r           y1           z1   NA
 3:  1            x           x1            r           y1           z1   NA
 4:  1            x           x1            r           y1           z1   NA
 5:  1            x           x1            r           y1           z1   NA
 6:  1            x           x1            r           y1           z1   NA
 7:  1            x           x1            r           y1           z1   NA
 8:  2            y           x2            r           y2           z1   NA
 9:  2            y           x2            r           y2           z1   NA
10:  2            y           x2            r           y2           z1   NA
11:  2            y           x2            r           y2           z1   NA
12:  2            y           x2            r           y2           z1   NA

但是,如果我仅在子段中提取一个具有恒定值的样本,那么它将起作用:

new_data = Data[id ==1]
new_data[(length(unique(Sub_Segment1[Sub_Segment1!=""]))==1),name:="test" , by=id ]

返回正确的

   id Sub_Segment1 Sub_Segment2 Sub_Segment3 Sub_Segment4 Sub_Segment5 name
1:  1            x           x1            r           y1           z1 test
2:  1            x           x1            r           y1           z1 test
3:  1            x           x1            r           y1           z1 test
4:  1            x           x1            r           y1           z1 test
5:  1            x           x1            r           y1           z1 test
6:  1            x           x1            r           y1           z1 test
7:  1            x           x1            r           y1           z1 test

Data[id ==1,(length(unique(Sub_Segment1[Sub_Segment1!=""]))==1) ] # returns TRUE

我应该如何修改代码以将我的函数以data.table的方式应用于每组数据集?

1 个答案:

答案 0 :(得分:2)

您可以将要编辑的行的标识移动到命令的一部分中,在其中选择列:

# load data table package
library(data.table)
# create the data table from string
Data <- read.table(text=zz, header = TRUE)
setDT(Data)
# group by and adjust where condition is matched
Data[, name := ifelse(length(unique(Sub_Segment1[Sub_Segment1!=""])) == 1, "test", NA) , by=id ]

除了ifelse之外,您还可以在所有地方进行修改,然后进行过滤或使用join操作。