ggplot中的数据标准化

时间:2019-07-17 21:22:47

标签: r ggplot2

我的数据为

melted.df <- structure(list(organisms = structure(c(1L, 1L, 1L, 2L, 3L, 3L, 
3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L), .Label = c("Botrytis cinerea", "Fusarium graminearum", 
"Human", "Mus musculus"), class = "factor"), types = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("AllMismatches", 
"mismatchType2", "MismatchesType1", "totalDNA"), class = "factor"), 
    mutations = c(30501L, 12256L, 58357L, 366531L, 3475L, 186907L, 
    253453L, 222L, 24906L, 2775L, 247990L, 12324L, 4395L, 25324L, 
    77862L, 1862L, 112217L, 163117L, 100L, 17549L, 1057L, 20331L, 
    18177L, 7861L, 33033L, 288669L, 1613L, 74690L, 90336L, 122L, 
    7357L, 1718L, 227659L, 635951L, 229493L, 868052L, 2418724L, 
    65833L, 1081903L, 1339758L, 4318L, 59387L, 15199L, 2134229L
    )), row.names = c(NA, -44L), class = "data.frame")

类型列中的总DNA值表示数据中的总DNA,而错配是突变。我想根据totalDNA值对这些数据进行归一化并绘制出来。我现在绘制的方式无法正确显示数据,因为todalDNA使整个Y轴膨胀,并且相对于totalDNA,其他三种类型(mismatchType2,mismatchesType1和AllMismatches)无法正确显示。什么是绘制此图的更好方法?我应该首先计算百分比吗?还是可以进行日志缩放?感谢您的帮助。

ggplot(melted.df, aes(x = types, y = mutations, color=types)) +       
  geom_point()+
  facet_grid(.~organisms)+
  xlab("Types")+
  ylab("Mismatches")+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

1 个答案:

答案 0 :(得分:3)

尝试对数刻度吗?

ggplot(melted.df, aes(x = types, y = mutations, color=types)) +       
  geom_point()+
  facet_grid(.~organisms)+
  xlab("Types")+
  ylab("Mismatches")+
  # ylim(c(90,130))+
  scale_y_log10()+ #add log scale
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

您如何标准化总DNA?您会使用(几何)均值吗?