美好的一天,
我想想象我的数据集,但我甚至在努力命名我需要的可视化类型!
我想看看参考标准和三个新测试之间的重叠集。
参考标准具有二元结果(R和S)。
三个新测试中的每一个都可以两个以上的结果(R,S,失败,不确定)
所以我的数据的一部分看起来像这样(作为R数据帧):
Subject <- c("11-0001","11-0002","11-0003","11-0004","11-0005","11-0007","11-0008","11-0010","11-0011","11-0012","11-0013","11-0014","11-0015","11-0016","11-0017","11-0018","11-0019","11-0020","11-0021","11-0022","11-0023","11-0025","11-0027","11-0029","11-0030","11-0035","11-0036","11-0037","11-0038","11-0039","11-0040","11-0041","11-0043","11-0044","11-0045","11-0046","11-0047","11-0048","11-0050","11-0052","11-0053","11-0054","11-0055","11-0056","11-0058","11-0059","11-0061","11-0062","11-0063","11-0064","11-0065","11-0066","11-0068","11-0069","11-0070","11-0071","11-0072","11-0074","11-0075")
ReferenceStandard <- c("R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","S","R","R","R","R","S","R","S","R","S")
TestA<- c("R","R","R","R","R","R","S","I","R","R","R","I","R","R","R","R","I","R","R","R","R","R","R","R","R","R","S","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","I","R","I","R","R","I","R","S","R","R","R","R","S","I","S","R","S")
TestB <- c("R","R","R","R","R","R","S","I","R","R","R","I","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","I","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","I","R","R","R","R","S","I","S","R","S")
TestC <-c("R","R","R","R","R","R","R","R","R","R","R","ND","R","R","R","R","R","R","R","R","R","R","R","R","R","R","S","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","S","R","R","R","R","S","ND","S","R","S")
mydata <- data.frame(subject=subject, ReferenceStandard=ReferenceStandard, TestA=TestA, TestB=TestB, TestC=TestC)
依此类推(我有1000个科目)......
因此,虽然所有针对参考标准的个体测试的灵敏度/特异性非常相似,但使用Cochran和McNemar的测试存在显着差异。
现在,我的假设是每次测试的失败都不同。因此TestA可能会在这组主题上失败,而TestB在不同的主题上失败。总的来说,数字足够相似,因此灵敏度/特异性非常相似,但配对样本统计检验突出显示并非如此。所以我想在视觉上进行检查。
然而,我真的被困在什么甚至称之为(因为新测试有四个类别)。
我已经研究过欧拉图,但我认为不能支持我需要的东西。
我认为我能做的就是制作两套欧拉图。
我还想过一个奇怪的热图,其中Y轴是所有1000个主体,X轴的排序方式与我上面的数据一样,但是每个颜色编码的四列。根据我对Y轴的排序方式,我可以展示数据的不同方面。问题是用这种图形挑选模式真的很难。
还有其他想法吗?其他可视化的链接将非常感谢!
答案 0 :(得分:2)
这是尝试可视化您的数据集。很难知道在没有实际数据的情况下要强调什么,但是这里有一个与其他海报一起玩的样本。根据您的帖子,我试图通过Ref
突出显示测试结果分布的差异。
library(reshape2)
library(ggplot2)
# make a data set
df <- data.frame(Subject=1:100, Ref = sample(c('R','S'),100,T), TestA = sample(c('R','F','S','I'),100,T), TestB = sample(c('R','F','S','I'),100,T), TestC = sample(c('R','F','S','I'),100,T) )
# melt into long
dfm <- melt(df, id=c('Subject','Ref'))
# and plot
ggplot(dfm, aes(x=variable, fill=value)) + geom_bar() + facet_wrap(~Ref)
# which gives
# or bars dodged rather than stacked
ggplot(dfm, aes(x=variable, fill=value)) + geom_bar(position='dodge') + facet_wrap(~Ref)
如果下面的@shujaa说的是真的,这里是一个类似的主题图片,通过引用突出显示每个测试的真正阳性率:
dfm <- transform(dfm, TP = value == Ref)
ggplot(dfm, aes(x=variable,fill=TP)) + geom_bar() + facet_wrap(~Ref)
或者关注@shujaa的最后评论,这是最后一次尝试:
ggplot(dfm, aes(x=variable,fill=TP)) + geom_bar() + facet_wrap(value~Ref)