我有下面提到的目录结构:
Folder named A contains txt files named 1, 2, 3, .., 5
Folder named B contains txt files named 1, 2, 3, .., 5
|
--A (Folder)
|---1.txt
|---2.txt
....
|---5.txt
--B (Folder)
|---1.txt
|---2.txt
....
|---5.txt
我正在通过2个嵌套for循环将这些文本文件读入数据框。单个数据框如下所示:
df <- data.frame(Comp.1 = c(0.3, -0.2, -1, NA, 1),
Comp.2 = c(-0.4, -0.1, NA, 0, 0.6),
Comp.3 = c(0.2, NA, -0.4, 0.3, NA))
row.names(df) <- c("Param1", "Param2", "Param3", "Param4", "Param5")
值始终介于-1和+1之间。所有这些数据帧的行数(参数)和列数(组件)不相同。例如:上述数据帧为3x5,其他数据帧为5x15,4x10,5x40等。
我想要一个有以下情节的情节:
1. parameters on x-axis
2. components on y-axis
3. values as points in the above graph
4. shape of point representing folder name (A = square, B = triangle, C = circle, .., E)
5. color inside the point shape representing file name (1, 2, 3, .., 5)
6. color intensity describing value (For eg: light red [almost white] color representing closer to -1 like -0.98, dark red representing closer to 1 like 0.98)
我有这段代码:
alphabets = c("A", "B", "C", "D", "E", "F")
numbers = c(1, 2, 3, 4, 5)
pca.plot <- ggplot(data = NULL, aes(xlab="Principal Components",ylab="Parameters"))
for (alphabet in alphabets){
for(number in numbers){
filename=paste("/filepath/",alphabet,"/",number,".txt", sep="")
df <- read.table(filename)
#Making all row dimensions = 62. Adding rows with NAs
if(length(row.names.data.frame(df))<62){
row_length = length(row.names.data.frame(df))
for(i in row_length:61){
new_row = c(NA, NA, NA, NA, NA, NA)
df<-rbind(df, new_row)
}
}
df$row.names<-rownames(df)
long.df<-melt(df,id=c("row.names"), na.rm = TRUE)
pca.plot<-pca.plot+geom_point(data=long.df,aes(x=variable,y=row.names, shape = number, color=alphabet, size = value))
}
}
编辑:按照评论中提到的@Gregor的步骤,我有一个像这样的big_data_frame:
head(big_data, 3)
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 params alphabet number
1 NA NA NA NA NA param1 A 1
2 NA NA NA 0.89 NA param2 A 1
3 NA -0.95 NA NA NA param3 A 1
答案 0 :(得分:1)
您需要melt
数据框才能折叠所有Comp
列。其他列应该保持不变:
long_data = reshape2::melt(
big_data,
id.vars = c("params", "alphabet", "number"),
variable.name = "comp",
value.name = "value",
na.rm = T
)
现在,您的大部分要求都很简单:
- x轴上的参数
- y轴上的组件
- 值作为上图中的点
- 表示文件夹名称的点的形状(A =方形,B =三角形,C =圆形,..,E)
- 表示文件名(1,2,3,..,5)
的点形状内的颜色- 颜色强度描述值(例如:浅红色[几乎白色]颜色表示接近-1,如-0.98,深红色表示接近1,如0.98)
醇>
ggplot(long_data, aes(
x = params, y = comp, size = value,
shape = folder, color = factor(number), alpha = value
)) +
geom_point()
棘手的部分是对颜色强度和整体颜色的要求。我知道使用标准ggplot
来估算这一点的唯一方法是使用透明度,就像我上面所做的那样。这是采用的方法,例如this question。
请注意,这是未经测试的,因为您的数据不可重复共享。如果存在需要测试的问题,请按照评论中的建议与dput
共享数据。