我尝试汇总信号检测实验中的数据,以计算命中率,误报率等。
Code Cond bf1 bf2 bf3 bf4 bm1 bm2 bm3 bm4
BAX-011 3 CR FA HIT FA FR CR FA FA
我的变量bf1到bm3是级别为(hit,fa,cr,fr)
的因子。
我想计算点击次数,fa ... for each
参与者(行),但包含变量子集(bf-items and bm-items
)。最简单的方法是什么?
最终看起来应该是这样:
Code Cond bf1 bf2 bf3 bf4 bm1 bm2 bm3 bm4 bf_hits bm_hits bf_fa ...
BAX-011 3 CR FA HIT FA FR CR FA FA 1 0 2 ...
答案 0 :(得分:1)
如果我正确理解您的问题,您可能只需要从“reshape2”包中探索melt
和dcast
。使用@ zx8754的示例数据,请尝试以下操作:
library(reshape2)
### Make the data into a "long" format
dfL <- melt(df, id.vars=c("Code", "Cond"))
### Split the existing "variable" column.
### Here's one way to do that.
dfL <- cbind(dfL, setNames(
do.call(rbind.data.frame, strsplit(
as.character(dfL$variable), "(?=\\d)", perl=TRUE)),
c("var", "time")))
### This is what the data now look like.
head(dfL)
# Code Cond variable value var time
# 1 BAX-011 3 bf1 CR bf 1
# 2 BAX-012 3 bf1 CR bf 1
# 3 BAX-013 3 bf1 CR bf 1
# 4 BAX-011 3 bf2 FA bf 2
# 5 BAX-012 3 bf2 FA bf 2
# 6 BAX-013 3 bf2 HIT bf 2
### Use `dcast` to aggregate the data.
### The default function is "length" which is what you're looking for.
dcast(dfL, Code + Cond ~ var + value, value.var="value")
# Aggregation function missing: defaulting to length
# Code Cond bf_CR bf_FA bf_HIT bm_CR bm_FA bm_FR bm_HIT
# 1 BAX-011 3 1 2 1 1 2 1 0
# 2 BAX-012 3 1 2 1 0 2 1 1
# 3 BAX-013 3 1 1 2 0 2 1 1
从那里,您可以始终merge
或cbind
相关列,以获得完整的data.frame
。
为了避免被视为“reshape2”粉丝,这里有一个基础R方法。我希望它也说明了为什么我在这种情况下采用了“reshape2”路线:
X <- grep("^bf|^bm", names(df))
df[X] <- lapply(df[X], as.character)
dfL <- cbind(dfL, setNames(
do.call(rbind.data.frame, strsplit(
as.character(dfL$ind), "(?=\\d)", perl=TRUE)),
c("var", "time")))
dfL$X <- paste(dfL$var, dfL$values, sep ="_")
dfA <- aggregate(values ~ Code + Cond + X, dfL, length)
reshape(dfA, direction = "wide", idvar=c("Code", "Cond"), timevar="X")
答案 1 :(得分:0)
试试这个:
#dummy data
df <- read.table(text="
Code Cond bf1 bf2 bf3 bf4 bm1 bm2 bm3 bm4
BAX-011 3 CR FA HIT FA FR CR FA FA
BAX-012 3 CR FA HIT FA FR HIT FA FA
BAX-013 3 CR HIT HIT FA FR HIT FA FA
", header=TRUE)
#count HITs per bf bm
df$bf_hit <- rowSums(df[,colnames(df)[grepl("bf",colnames(df))]]=="HIT")
df$bm_hit <- rowSums(df[,colnames(df)[grepl("bm",colnames(df))]]=="HIT")
#output
df
#Code Cond bf1 bf2 bf3 bf4 bm1 bm2 bm3 bm4 bf_hit bm_hit
#1 BAX-011 3 CR FA HIT FA FR CR FA FA 1 0
#2 BAX-012 3 CR FA HIT FA FR HIT FA FA 1 1
#3 BAX-013 3 CR HIT HIT FA FR HIT FA FA 2 1