Question

我有一个看起来像这样的data.frame：

                 HSP90AA1      SSH2      ACTB TotalTranscripts
ESC_11_TTCGCCAAATCC  8.053308 12.038484 10.557234         33367.23
ESC_10_TTGAGCTGCACT  9.430003 10.687959 10.437068         30285.41
ESC_11_GCCGCGTTATAA  7.953726  9.918988 10.078192         30133.94
ESC_11_GCATTCTGGCTC 11.184402 11.056144  8.316846         24857.07
ESC_11_GTTACATTTCAC 11.943733 11.004500  9.240883         23629.00
ESC_11_CCGTTGCCCCTC  7.441695  9.774733  7.566619         22792.18

TotalTranscripts列按降序排序。我想要做的是使用ggplot2生成三个条形图，每个条形图对应于data.frame的每一列，但TotalTranscripts除外。我希望条形图按TotalTranscripts排序，就像data.frame一样。我希望使用小平面包装将这些条形图放在一个图上。

任何帮助将不胜感激！谢谢！

编辑：这是我使用barplot（）的当前代码。

cells = "ESC"
genes = c("HSP90AA1", "SSH2", "ACTB")
g = data[genes,grep(cells, colnames(data))]
g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))])
colnames(g)[ncol(g)] = "TotalTranscripts"
g = g[order(g$TotalTranscripts, decreasing=T), , drop=F]

barplot(as.matrix(g[1]), beside=TRUE, names.arg=paste(rownames(g)," (",g$TotalTranscripts,")",sep=""), las=2, col="light blue", cex.names=0.3, main=paste(colnames(g)[1], "\nCells sorted by total number of transcripts (colSums)", sep=""))

这将生成一个看起来像this的图。

同样，我似乎在这里遇到的问题是如何在同一图像上有多个这些图。我想在这个data.frame中添加20多个列，但为了简单起见，我将其减少到3个。

编辑：当前代码包含以下答案

cells = "ESC"
genes = rownames(data[x,])[1:8]
# genes = c("HSP90AA1", "SSH2", "ACTB")
g = data[genes,grep(cells, colnames(data))]
g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))])
colnames(g)[ncol(g)] = "TotalTranscripts"
g = g[order(g$TotalTranscripts, decreasing=T), , drop=F]
g$rowz <- row.names(g)
g$Cells <- reorder(g$rowz, rev(g$TotalTranscripts))
df1 <- melt(g, id.vars = c("Cells", "TotalTranscripts"), measure.vars=genes)
ggplot(df1, aes(x = Cells, y = value)) + geom_bar(stat = "identity") +
  theme(axis.title.x=element_blank(), axis.text.x = element_blank()) +
  facet_wrap(~ variable, scales = "free") + 
  theme_bw() + theme(axis.text.x = element_text(angle = 90))

Answer 1

以下是其他任何人的示例数据：

TypeError                                 Traceback (most recent call         last)
<ipython-input-12-dbef5920f124> in <module>()
----> 1 df[df['ID'].map(lambda ID: "342270" in ID)]

/home/noteleks/anaconda3/lib/python3.5/site-packages/pandas                /core/series.py in map(self, arg, na_action)
                                                                                  2119                                                                               index=self.index).__finalize__(self)
   2120         else:
   -> 2121             mapped = map_f(values, arg)
  2122             return self._constructor(mapped,
  2123                                          index=self.index).__finalize__(self)

pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:63043)()

<ipython-input-12-dbef5920f124> in <lambda>(ID)
----> 1 df[df['ID'].map(lambda ID: "342270" in ID)]

TypeError: argument of type 'numpy.int64' is not iterable

这是一个解决方案：

df <- structure(list(HSP90AA1 = c(8.053308, 9.430003, 7.953726, 11.184402, 
                                  11.943733, 7.441695), SSH2 = c(12.038484, 10.687959, 9.918988, 
                                                                 11.056144, 11.0045, 9.774733), ACTB = c(10.557234, 10.437068, 
                                                                                                         10.078192, 8.316846, 9.240883, 7.566619), TotalTranscripts = c(33367.23, 
                                                                                                                                                                        30285.41, 30133.94, 24857.07, 23629, 22792.18)), .Names = c("HSP90AA1", 
                                                                                                                                                                                                                                    "SSH2", "ACTB", "TotalTranscripts"), class = "data.frame", row.names = c("ESC_11_TTCGCCAAATCC", 
                                                                                                                                                                                                                                                                                                             "ESC_10_TTGAGCTGCACT", "ESC_11_GCCGCGTTATAA", "ESC_11_GCATTCTGGCTC", 
                                                                                                                                                                                                                                                                                                             "ESC_11_GTTACATTTCAC", "ESC_11_CCGTTGCCCCTC"))

Kohske的example是ggplot2中订购元素的常量参考。

如果你有很多列，但有六个ESC复合体，你可以切换分组，即#New column for row names so they can be used as x-axis elements df$rowz <- row.names(df) #Explicitly order the rows (see the Kohske link) df$rowz1 <- reorder(df$rowz, rev(df$TotalTranscripts)) library(reshape2) #Melt the data from wide to long df1 <- melt(df, id.vars = c("rowz1", "TotalTranscripts"), measure.vars = c("HSP90AA1", "SSH2", "ACTB")) library(ggplot2) gp <- ggplot(df1, aes(x = rowz1, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable, scales = "free") + theme_bw() gp + theme(axis.text.x = element_text(angle = 90))和x = variable，但这从根本上改变了你可视化/比较数据的方式。另外，如果您可以按2个组件组织列，请考虑facet_wrap(~ rowz1)（列是融入“变量”和“值”的数据）。

这个额外的SO solution与您的问题无关，但它是一种优雅的方式，可以按其值对每个方面中的元素进行重新排序（以供将来参考）。

最后，为您提供最佳控制的方法是分别绘制每个图形并组合凹槽。 Baptiste的软件包如 gridExtra 和 gtable 对这些任务非常有用。