这是数据样本:
zz <- "
id Sub_Segment1 Sub_Segment2 Sub_Segment3 Sub_Segment4 Sub_Segment5
1 x x1 r y1 z1
1 x x1 r y1 z1
1 x x1 r y1 z1
1 x x1 r y1 z1
1 x x1 r y1 z1
1 x x1 r y1 z1
1 x x1 r y1 z1
2 y x2 r y2 z1
2 y x2 r y2 z1
2 y x2 r y2 z1
2 y x2 r y2 z1
2 y x2 r y2 z1
"
Data <- read.table(text=zz, header = TRUE)
setDT(Data)
如果我将修改应用于整个表,它将返回NA:
Data[(length(unique(Sub_Segment1[Sub_Segment1!=""]))==1),name:="test" , by=id ]
返回:
id Sub_Segment1 Sub_Segment2 Sub_Segment3 Sub_Segment4 Sub_Segment5 name
1: 1 x x1 r y1 z1 NA
2: 1 x x1 r y1 z1 NA
3: 1 x x1 r y1 z1 NA
4: 1 x x1 r y1 z1 NA
5: 1 x x1 r y1 z1 NA
6: 1 x x1 r y1 z1 NA
7: 1 x x1 r y1 z1 NA
8: 2 y x2 r y2 z1 NA
9: 2 y x2 r y2 z1 NA
10: 2 y x2 r y2 z1 NA
11: 2 y x2 r y2 z1 NA
12: 2 y x2 r y2 z1 NA
但是,如果我仅在子段中提取一个具有恒定值的样本,那么它将起作用:
new_data = Data[id ==1]
new_data[(length(unique(Sub_Segment1[Sub_Segment1!=""]))==1),name:="test" , by=id ]
返回正确的
id Sub_Segment1 Sub_Segment2 Sub_Segment3 Sub_Segment4 Sub_Segment5 name
1: 1 x x1 r y1 z1 test
2: 1 x x1 r y1 z1 test
3: 1 x x1 r y1 z1 test
4: 1 x x1 r y1 z1 test
5: 1 x x1 r y1 z1 test
6: 1 x x1 r y1 z1 test
7: 1 x x1 r y1 z1 test
和
Data[id ==1,(length(unique(Sub_Segment1[Sub_Segment1!=""]))==1) ] # returns TRUE
我应该如何修改代码以将我的函数以data.table的方式应用于每组数据集?
答案 0 :(得分:2)
您可以将要编辑的行的标识移动到命令的一部分中,在其中选择列:
# load data table package
library(data.table)
# create the data table from string
Data <- read.table(text=zz, header = TRUE)
setDT(Data)
# group by and adjust where condition is matched
Data[, name := ifelse(length(unique(Sub_Segment1[Sub_Segment1!=""])) == 1, "test", NA) , by=id ]
除了ifelse之外,您还可以在所有地方进行修改,然后进行过滤或使用join操作。