我有一个包含两列的data.frame,一个唯一标识符和一个结果。我需要遍历data.frame并获取有多少唯一标识符的计数以及唯一结果的计数。结果列可以有三种可能的结果,正面,负面或不明确。因此,例如,如果有10个“RVP PCR”标识符,我需要创建一个包含四列的行," Count",“Positive”,“Negative”,“Ambiguous”,并且在这些列中应该是计算他们发生了多少次。因此,在具有10个“RVP PCR”标识符的示例中,输出行应显示标识符,而不是计数10,7个负数,1个正数和2个不明确数。你怎么用R来完成这个?
str(foo)
>
'data.frame': 51 obs. of 2 variables:
$ identifier: Factor w/ 99 levels "ADENOPCR","ALB-BF",..: 51 51 56 56 57 57 57 57 18 18 ...
$ result : Factor w/ 3 levels "Ambiguous","Negative",..: 2 1 2 1 2 1 2 1 2 1 ...
dput(foo)
>
structure(list(identifier = structure(c(80L, 80L, 80L, 80L, 80L,
80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L,
80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 64L,
18L, 18L, 76L, 76L, 76L, 70L, 70L, 70L, 70L, 71L, 64L, 77L, 77L,
77L, 77L, 77L, 77L, 77L, 77L, 76L), .Label = c("ADENOPCR", "ALB-BF",
"ASPERAG", "ASPERAGB", "BDGLUCAN", "BLASTO", "BORD PCR", "BPERT",
"CMV QNT", "CMVPCR", "COCCI", "COCCI G/M", "COCCI PAN", "COCCI-PPT",
"CPNEUMOPCR", "CRP", "CRY BLD", "CWP-KOH", "DIFF CONF", "EBV PAN",
"EBV PAN 2", "EBV QNT", "EXCEPT", "EXCEPT TT", "FLUFAC", "FUNG PKG",
"FUNGSEQ", "GLU-FL", "HERP I", "HHV6PCR", "HISTO", "HISTO PPT",
"HISTOAG S", "HISTOGM U", "HMPVFA", "HMPVPCR", "HSVPCR", "LEGAG-U",
"LEGIONFA", "LEGIONPCR", "MA AFB", "MA FUNGAL", "MA MIC", "MA MTBPRIM",
"MC AFB", "MC AFBID", "MC AFBR", "MC BAL", "MC BLD", "MC CYST",
"MC FUNG", "MC FUNGID", "MC Legion", "MC LEGION", "MC MTD", "MC NOC",
"MC RESP", "MC STAPH", "MC Strep", "MC STREP", "MC VRE", "MC W",
"MICROSEQ", "MPNEUMOPCR", "MS CWP", "MTBRIF PCR", "MYCO-M", "NG REPORT",
"ORGSEQ", "PARAFLUPCR", "PCP PCR", "PNEUMO AB", "PNEUMST", "PNEUMST R",
"RESPMINI", "RESPMINI ", "RSPFA", "RSPFAC", "RSV", "RVP PCR",
"RVPPCR", "SPN AG", "TP-FL", "V CMVC", "V FLUC", "V HSVC", "V HSVCT",
"V RESPC", "V Urea", "V VIC", "V VIC R", "V VIRAL", "V VIRAL N",
"V VIRAL R", "V VZV", "VDRL CSF", "VZVFAC", "VZVPCR", "WNILE PCR"
), class = "factor"), result = structure(c(2L, 2L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L,
2L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Ambiguous",
"Negative", "Positive"), class = "factor")), .Names = c("identifier",
"result"), row.names = 1500:1550, class = "data.frame")
答案 0 :(得分:2)
我不完全确定您的预期输出是什么,但您可以重塑数据:
library(reshape2)
dcast(foo, identifier~result, fun.aggregate= length)
这会产生:
identifier Negative Positive
1 CWP-KOH 2 0
2 MPNEUMOPCR 0 2
3 PARAFLUPCR 3 1
4 PCP PCR 0 1
5 RESPMINI 4 0
6 RSPFA 7 1
7 RVP PCR 28 2
########编辑添加#############
根据您提供的数据,“RVP PCR”无法产生您所说的结果。
答案 1 :(得分:2)
library(dplyr)
library(tidyr)
foo %>%
group_by(identifier, result) %>%
summarise(n = n()) %>%
spread(key = result, value = n, drop = FALSE, fill = 0) %>%
mutate(Total = Ambiguous + Negative + Positive) %>%
filter(Total > 0)
结果
Source: local data frame [7 x 5]
Groups: identifier [7]
identifier Ambiguous Negative Positive Total
(fctr) (dbl) (dbl) (dbl) (dbl)
1 CWP-KOH 0 2 0 2
2 MPNEUMOPCR 0 0 2 2
3 PARAFLUPCR 0 3 1 4
4 PCP PCR 0 0 1 1
5 RESPMINI 0 4 0 4
6 RSPFA 0 7 1 8
7 RVP PCR 0 28 2 30
答案 2 :(得分:1)
如果没有额外的套餐,您可以这样做:
xtabs(~ identifier + result, data=droplevels(foo))
这给出了这个结果:
> xtabs(~ identifier + result, data=droplevels(foo))
result
identifier Negative Positive
CWP-KOH 2 0
MPNEUMOPCR 0 2
PARAFLUPCR 3 1
PCP PCR 0 1
RESPMINI 4 0
RSPFA 7 1
RVP PCR 28 2
如果您需要数据框:
as.data.frame(unclass(xtabs(~ identifier + result, data=droplevels(foo))))
如果您想要长格式的结果,您也可以这样做:
foo$count <- 1
aggregate(count ~ identifier+result, data=foo, FUN=length)
答案 3 :(得分:1)
数据采用长格式。首先使用reshape2库中的dcast命令将其更改为宽。添加一列并获取所有行的总和。
library(reshape2)
widedata<-dcast(foo,identifier~result)
widedata$Count<-0 #adds column for Count
widedata$Count<-rowSums (widedata[,2:4], na.rm = FALSE, dims = 1) #[,2:4] since the data will have a column for ambiguous as well.
答案 4 :(得分:0)
library(tidyr)
library(dplyr)
foo %>%
count(identifier, result) %>%
spread(result, n) # or spread(result, n, fill = 0, drop = FALSE)
# identifier Negative Positive
# (fctr) (int) (int)
# 1 CWP-KOH 2 NA
# 2 MPNEUMOPCR NA 2
# 3 PARAFLUPCR 3 1
# 4 PCP PCR NA 1
# 5 RESPMINI 4 NA
# 6 RSPFA 7 1
# 7 RVP PCR 28 2