这是我的大数据矩阵示例&并且每列都以多个信息命名,并以下划线分隔。
structure(list(Gene = c("AGI4120.1_UBQ", "AGI570.1_Acin"), WT_Tissue_0T_1 = c(0.886461437, 1.093164915), WT_Tissue_0T_2 = c(1.075140682, 1.229862834), WT_Tissue_0T_3 = c(0.632903012, 1.094003128), WT_Tissue_1T_1 = c(0.883151274, 1.26322126), WT_Tissue_1T_2 = c(1.005627276, 0.962729188), WT_Tissue_1T_3 = c(0.87123469, 0.968078993), WT_Tissue_3T_1 = c(0.723601456, 0.633890322), WT_Tissue_3T_2 = c(0.392585237, 0.534819363), WT_Tissue_3T_3 = c(0.640185369, 1.021934772), WT_Tissue_5T_1 = c(0.720291294, 0.589244505), WT_Tissue_5T_2 = c(0.362131744, 0.475251717), WT_Tissue_5T_3 = c(0.549486925, 0.618177919), mut1_Tissue_0T_1 = c(1.464415756, 1.130533457), mut1_Tissue_0T_2 = c(1.01489573, 1.114915728), mut1_Tissue_0T_3 = c(1.171797418, 1.399956009), mut1_Tissue_1T_1 = c(0.927507448, 1.231911575), mut1_Tissue_1T_2 = c(1.089705396, 1.256782289 ), mut1_Tissue_1T_3 = c(0.993048659, 0.999044465), mut1_Tissue_3T_1 = c(1.000993049, 1.103486794), mut1_Tissue_3T_2 = c(1.062562066, 0.883617224 ), mut1_Tissue_3T_3 = c(1.037404833, 0.851875438), mut1_Tissue_5T_1 = c(0.730883813, 0.437440083), mut1_Tissue_5T_2 = c(0.480635551, 0.298762126 ), mut1_Tissue_5T_3 = c(0.85468388, 0.614923997)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Gene = structure(list(), class = c("collector_character", "collector")), WT_Tissue_0T_1 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_0T_2 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_0T_3 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_1T_1 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_1T_2 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_1T_3 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_3T_1 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_3T_2 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_3T_3 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_5T_1 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_5T_2 = structure(list(), class = c("collector_double", "collector")), WT_Tissue_5T_3 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_0T_1 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_0T_2 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_0T_3 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_1T_1 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_1T_2 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_1T_3 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_3T_1 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_3T_2 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_3T_3 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_5T_1 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_5T_2 = structure(list(), class = c("collector_double", "collector")), mut1_Tissue_5T_3 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
我想跟踪Tukey测试并绘制每个基因的条形图(响应与时间的关系;由两种基因型填充)和多个比较字母。
语法
df1 <- df %>%
gather(var, response, WT_Tissue_0T_1:mut1_Tissue_5T_3) %>%
separate(var, c("Genotype", "Tissue", "Time"), sep = "_") %>%
arrange(desc(Gene))
df2 <- df1 %>%
group_by(`Gene`,Genotype,Tissue,Time) %>%
mutate(Response=mean(response),n=n(),se=sd(response)/sqrt(n))
双向ANOVA
fit1 <- aov(Response ~ Genotype*Time, df2)
此后,我想进行Tukey测试(多重比较),例如基因“AGI4120.1_UBQ”,绘制响应与时间的关系,并查看每个基因型(WT&amp; mut1)在每个时间点(0T,1T,3T和5T)的表现如何?如果响应明显不同或不同,并用图中的字母表示。
如下所示,lsmeans语法将所有Genes组合成一个并给出输出,如何让它分别为每个基因循环(即“AGI4120.1_UBQ”,“AGI570.1_Acin”)并获取字母以显示统计上不同的组(又名“紧凑字母显示”)
lsmeans(fit1, pairwise ~ Genotype | Time)
我的最终目标是在下图中绘制每个基因,并表示重要字母。
df2$genotype <- factor(df2$genotype, levels = c("WT","mut1")) colours <- c('#336600','#ffcc00')library(ggplot2)ggplot(df2,aes( x=Time, y=Response, fill=Genotype))+geom_bar(stat='identity', position='dodge')+scale_fill_manual(values=colours)+geom_errorbar(aes(ymin=average_measure-se, ymax=average_measure+se)+facet_wrap(~`Gene`)+labs(x='Time', y='Response')
Expecting Graph for denoting compact letter display
如果可能,我将非常感谢您的帮助。
答案 0 :(得分:1)
您的代码存在许多问题。我甚至说它对StackOverflow来说并不是一个合适的帖子,因为这里的问题是多种多样的,并且不能超出这个特定的错误和语法问题。但我会发布一些建议作为让你开始的答案 - 希望他们帮助。
<强> 1。 lsmeans()
强>
此函数需要拟合模型(如lm()
)或ref.grid
对象。但是你传递的是一个数据框,没有任何计算最小二乘法所需的回归属性。 (想一想:lsmeans()
当你要求与Genes
作为自变量进行成对比较时,{1}}如何知道因变量应该是什么?)查看Using lsmeans
documentation了解更多详情。
根据您的数据,我想您可能希望运行多级回归,使用lme4
包,Gene
和Genotype
以及{{{} 1}}作为嵌套分组级别。
但是为了演示,我会用Time
保持简单。将拟合的回归对象传递给lm()
按预期工作:
lsmeans()
<强> 2。 fit <- lm(Response ~ Gene + Genotype + Time, data=df2)
lsmeans(fit, pairwise ~ Gene)[[2]]
# Output
contrast estimate SE df t.ratio p.value
AGI4120.1_UBQ - AGI570.1_Acin -0.0515123 0.0299492 42 -1.72 0.0928
Results are averaged over the levels of: Genotype, Time
强>
您尚未在自己提供的代码中定义ggplot()
或colours
;调用这些未声明的变量将导致失败。
从结构上讲,我建议您使用average_measure
并允许df1
进行分组,而不是ggplot
进入group_by
。然后使用df2
和stat="summary"
完成您在fun.y="mean"
中执行的summarise()
次计算。这样做还允许您使用df2
函数作为误差线。像这样:
mean_se
最后请注意,在ggplot(df1,aes( x=Time, y=response, fill=Genotype))+
geom_bar(stat='summary', fun.y='mean', position=position_dodge(0.9))+
stat_summary(fun.data = mean_se, geom = "errorbar",
color="gray60", width=.1, position=position_dodge(0.9)) +
scale_fill_manual(values=c("steelblue","orange"))+
facet_wrap(~`Gene`)+
labs(x='Time', y='Response')
中使用separate()
会发出警告,但不会出现错误,因为在df1
上拆分会产生额外的值。您可以通过添加一个级别来捕获最终值(这似乎是一个时间索引)来避免这种情况(如果它引起混淆):
sep="_"