我正在尝试主要使用聚合/合并/子集数据集编写函数。我的数据框架如下:
NameA NameB NameC Score1 Score2
A F K 3 3
B F L 5 5
C F M 7 4
D G N 2 2
E G O 5 8
我将运行的功能是:
test <- Fun(data, Score1, NameB)
首先,我想计算得分1的平均值,按名称B分组:
Fun <- function(df, col, group_by){
setDT(df)
df1<- df[, sapply(.SD, mean), .SDcols = col, by= group_by]
}
经过一些额外的编码,我的数据框变为:
NameA NameB NameC Score1 Score2 Group_Mean
A F K 3 3 4
B F L 5 5 4
C F M 4 4 4
D G N 2 2 5
E G O 5 8 5
然后,我希望我的数据框的子集与Score1!= Score2。所以我写道:
Fun <- function(df, col, group_by){
setDT(df)
df1<- df[, sapply(.SD, mean), .SDcols = col, by= group_by]
df2 <- df1[which(df1[col] != df[Score2])]
}
但是这给了我一条错误消息:
Error in Ops.data.frame(df2[col], df[Score2]) :
‘==’ only defined for equally-sized data frames
在这一步之后,我想做更多的数学和子集,如下所示:
Fun <- function(df, col, group_by){
setDT(df)
df1<- df[, sapply(.SD, mean), .SDcols = col, by= group_by]
df2 <- df1[which(df1[col] != df[Score2])]
df2["NewCol"] <- abs(df2[col] - df2[Score2])
output <- df2[which(df2[NewCol] > 1 or df2[NewCol] < 1.5)]
return(output)
}
我是R和R用户定义函数的新手。在错误消息部分之后,我被困了很长时间。如果有人能够就我上面的代码给我任何建议,我将非常感激!
答案 0 :(得分:1)
我不确定鼓励R新手进入data.table
语法和函数调用的混合是否明智。
但是,这里有一些示例函数。
library(data.table)
data <- fread(
"NameA NameB NameC Score1 Score2
A F K 3 3
B F L 5 5
C F M 7 4
D G N 2 2
E G O 5 8"
)
Fun1 <- function(df, col, group_by){
setDT(df)[, sapply(.SD, mean), .SDcols = col, by = group_by]
}
Fun1(data, "Score1", "NameB")
NameB V1 1: F 5.0 2: G 3.5
请注意,在下一个示例中使用Score2
来重现OP描绘的数据帧:
Fun2 <- function(df, col, group_by){
setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
}
Fun2(data, "Score2", "NameB")[]
NameA NameB NameC Score1 Score2 Group_Mean 1: A F K 3 3 4 2: B F L 5 5 4 3: C F M 7 4 4 4: D G N 2 2 5 5: E G O 5 8 5
示例3:
Fun3 <- function(df, col, group_by){
setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
df[get(col) != Score2]
}
Fun3(data, "Score1", "NameB")[]
NameA NameB NameC Score1 Score2 Group_Mean 1: C F M 7 4 5.0 2: E G O 5 8 3.5
请注意,下面的函数已被修改为OP的草案,以便返回非空数据。表
Fun4 <- function(df, col, group_by){
setDT(df)[, Group_Mean := mean(get(col)), by = group_by]
df[, NewCol := abs(get(col) - Group_Mean)]
df[between(NewCol, 1.0, 1.5, incbounds = TRUE)]
}
Fun4(data, "Score1", "NameB")[]
NameA NameB NameC Score1 Score2 Group_Mean NewCol 1: D G N 2 2 3.5 1.5 2: E G O 5 8 3.5 1.5
请注意,之前所有函数调用都已data
修改了in place
data
NameA NameB NameC Score1 Score2 Group_Mean NewCol 1: A F K 3 3 5.0 2.0 2: B F L 5 5 5.0 0.0 3: C F M 7 4 5.0 2.0 4: D G N 2 2 3.5 1.5 5: E G O 5 8 3.5 1.5