我有以下数据:
Project Topic C10 C14 C03 C11 C16 C08
P1 T1 0.24 0.00 0.00 0.04 0.04 0.00
P1 T2 0.00 0.30 0.00 0.00 0.00 0.00
P1 T3 0.04 0.04 0.00 0.24 0.00 0.00
P1 T4 0.00 0.00 0.00 0.04 0.33 0.04
P1 T5 0.00 0.09 0.21 0.00 0.00 0.00
P1 T6 0.00 0.09 0.00 0.00 0.00 0.34
P2 T1 0.20 0.00 0.00 0.04 0.00 0.04
P2 T2 0.00 0.22 0.04 0.00 0.00 0.00
P2 T3 0.04 0.00 0.00 0.24 0.00 0.00
P2 T4 0.00 0.00 0.04 0.00 0.33 0.00
P2 T5 0.04 0.00 0.21 0.00 0.00 0.00
P2 T6 0.00 0.04 0.00 0.00 0.00 0.34
P3 T1 0.20 0.00 0.00 0.00 0.08 0.00
P3 T2 0.00 0.17 0.00 0.00 0.00 0.00
P3 T3 0.00 0.00 0.00 0.08 0.00 0.00
P3 T4 0.00 0.04 0.00 0.04 0.24 0.00
P3 T5 0.00 0.00 0.21 0.00 0.00 0.04
P3 T6 0.00 0.09 0.00 0.00 0.00 0.22
......
我想要做的是将以上数据创建到以下图中:
在此草图中,条形的高度属于C#s'值,它应该有六种颜色。每个条形图都属于P#的数据集。
我尝试使用以下代码将每个P#s数据集复制到.csv文件中,并使用par(mfrow=c(5,3))
将其绘制在同一个绘图框中:
library(e1071)
topics <- read.csv("P1.csv", head=TRUE)
dput(head(topics))
pdf("cosinesimilarityplots.pdf", family="Times")
par(mfrow=c(5,3))
colours <- c("red", "orange", "yellow", "green","blue"," black")
barplot(as.matrix(topics), main="Project Name", ylab="", cex.lab = 1.5, cex.main = 1.4, beside=TRUE, col=colours,ylim=c(0, 0.5))
title(ylab=expression(paste("Cose(", theta, ")")),xlab="Seeded-LDA topics", line=2, cex.lab=1.2)
legend("topleft", c("C10: Resource Management (RM)","C14:Cross Site Scripting (XSS)","C03:Authentication Abuse (AA)","C11:Buffer Overflow (BoF)","C16:Access Privileges (AP)","C08:SQL Injection (SI)"), cex=0.85, bty="n", fill=colours)
dev.off()
dput(head(topics))
的结果如下:
structure(list(T1 = c(0.24, 0, 0, 0.04, 0.04, 0), T2 = c(0.24,
0.3, 0, 0, 0, 0), T3 = c(0.04, 0.04, 0, 0.24, 0, 0), T4 = c(0,
0, 0, 0.04, 0.33, 0.04), T5 = c(0, 0.09, 0.21, 0, 0, 0), T6 = c(0,
0.09, 0, 0, 0, 0.34)), .Names = c("T1", "T2", "T3", "T4", "T5",
"T6"), row.names = c(NA, 6L), class = "data.frame")
然后,我意识到条形图的质量变得非常低,如果P#的数量大于15,那么将每个P#的数据绘制在一个单独的.csv文件中将会特别耗费。
在不将主数据集文件拆分为较小文件的情况下,有效绘制主数据集文件的方法是什么?优选使用R
答案 0 :(得分:0)
您可以使用ggplot2
和gridextra
创建一个类似于您使用dplyr
和reshape2
获得一些帮助的草图。通常情况下,仅仅因为你有能力在R中做某事并不意味着它是直观的。基本上,您必须为每个项目创建单独的绘图对象,去掉图例,然后使用grid.arrange()
重新组合所有内容。
library(tidyverse) # ggplot2, dplyr, etc
library(reshape2) # Outdated but still works
library(gridExtra) # Allows us to put plots into grids
# Generate some dummy data
data <- tibble(
Project = rep(paste0("P", 1:6), length = 30),
C10 = abs(rnorm(30)),
C14 = runif(30),
C03 = sample(1:30) / 50,
C11 = rnorm(30) ^ 2,
C16 = abs(rnorm(30) / 2),
C08 = abs(rnorm(30) * 2)
)
data <- data %>%
arrange(Project) %>%
mutate(Topic = rep(paste0("T", 1:5), length = 30))
# Melt the data from wide to long format
data <- melt(data, id.vars = c("Project", "Topic"))
#########################################################
# Now you can actually create the chart
#########################################################
# Use a function to create a version of the plot for each Project
plot_proj <- function(projnum) {
filter(data, Project == projnum) %>%
rename(Legend = variable) %>%
ggplot(., aes(x = Topic, y = value, fill = Legend)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "", y = "", title = projnum) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5),
panel.border = element_blank())
}
# Create a separate plot for each Project
plots <- map(unique(data$Project), plot_proj)
# This function was borrowed from an older StackOverflow answer
# Source: http://stackoverflow.com/questions/13649473/add-a-common-legend-for-combined-ggplots
g_legend <- function(a.gplot) {
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x)
x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)
}
mylegend <- g_legend(plots[[1]])
# Combine the plots and add one
grid.arrange(
arrangeGrob(
plots[[1]] + theme(legend.position = "none"),
plots[[2]] + theme(legend.position = "none"),
plots[[3]] + theme(legend.position = "none"),
plots[[4]] + theme(legend.position = "none"),
plots[[5]] + theme(legend.position = "none"),
plots[[6]] + theme(legend.position = "none"),
left = mylegend
)
)