我正在尝试创建一个规则,为每个唯一的字符串分配一个特定的颜色代码,以便在ggplot2中为不同的文件进行绘图。例如,如果我有两个制表符分隔文件, file1.txt 和 file2.txt ,则如下所示:
FILE1.TXT
Freq Seq
90 AAGTGT
3 AAGTGG
3 AAGTCC
2 AATTTT
2 TTTTTT
FILE2.TXT
Freq Seq
91 AAGTGT
4 AAGTGG
2 AAGTCC
2 CCCCCC
1 TTTTTT
对于6种不同的序列(AAGTGT,AAGTGG,AAGTCC,CCCCCC,TTTTTT,AATTTT),上述文件总共有6种不同的颜色。在我的许多文件中,我有~3000种颜色,我已经创建了一个调色板(pal
)供使用
pal<-c(randomColor(count=2951))
是否有一种方法可以确保我的许多文件中的所有序列都保持字符串的有序对和相应的十六进制颜色代码(即显示AAGTGT序列的所有文件将具有该字符串的相同十六进制颜色代码)?值得注意的是,并非所有3000种颜色都在每个文件中表示。
谢谢!
答案 0 :(得分:1)
希望这有帮助!
library(ggplot2)
library(randomcoloR)
#build a pallete mapping using 'Seq' column's value in all available dataframes
set.seed(123)
pal <- c(randomColor(count=6))
pal_seq_mapping <- data.frame(sequence=unique(c(as.character(df1$Seq),as.character(df2$Seq))), color=pal)
#example plot on 'df1' dataframe
ggplot(df1, aes(x=Seq, y=Freq)) +
geom_bar(stat="identity", fill=pal_seq_mapping[match(df1$Seq, pal_seq_mapping$sequence),"color"]) +
theme_bw()
#example plot on 'df2' dataframe
ggplot(df2, aes(x=Seq, y=Freq)) +
geom_bar(stat="identity", fill=pal_seq_mapping[match(df2$Seq, pal_seq_mapping$sequence),"color"]) +
theme_bw()
#sample data
> dput(df1)
structure(list(Freq = c(90L, 3L, 3L, 2L, 2L), Seq = structure(c(3L,
2L, 1L, 4L, 5L), .Label = c("AAGTCC", "AAGTGG", "AAGTGT", "AATTTT",
"TTTTTT"), class = "factor")), .Names = c("Freq", "Seq"), class = "data.frame", row.names = c(NA,
-5L))
> dput(df2)
structure(list(Freq = c(91L, 4L, 2L, 2L, 1L), Seq = structure(c(3L,
2L, 1L, 4L, 5L), .Label = c("AAGTCC", "AAGTGG", "AAGTGT", "CCCCCC",
"TTTTTT"), class = "factor")), .Names = c("Freq", "Seq"), class = "data.frame", row.names = c(NA,
-5L))