Question

我正在尝试可视化错误跟踪系统的统计信息。

我想要做的是概述传入和修复的错误票，我认为条形图是一个很好的解决方案。

我已经购买了Hadley Wickham的“ggplot2优雅图形数据分析”这本书，我试图了解geoms和stats的工作原理，但我认为如果没有额外的支持，我将需要的时间远远超过我的花费。研究它。

如果您可以帮助我根据数据表获取进/出票的概述，那将是很好的（不幸的是，在问题中似乎无法附加csv）。

id external    in.date      fixed in.cw fixed.cw
 1        x 01.11.2013 15.11.2013  1344     1346
 2          07.11.2013             1345     <NA>
 3        x 15.11.2013             1346     <NA>
 4          01.11.2013 15.11.2013  1344     1346
 5        x 07.11.2013 20.11.2014  1345     1447
 6          15.11.2013             1346     <NA>
 7        x 01.11.2013             1344     <NA>
 8          07.11.2013 05.01.2014  1345     1402
 9        x 15.11.2013 05.01.2014  1346     1402
10          01.11.2013 05.01.2014  1344     1402
11        x 07.11.2013             1345     <NA>
12          15.11.2013             1346     <NA>
13        x 01.11.2013 01.03.2014  1344     1409
14          07.11.2013 01.04.2014  1345     1414
15        x 15.11.2013             1346     <NA>
16          01.11.2013 01.05.2014  1344     1418

我认为如果将固定票证添加到图表的附加图层中，它会给出一个很好的概述。

是否也可以定义透明度值，以便覆盖的条形图仍然可见？

在我标记的数据中，还有一些条目作为外部票据，我想以某种方式可视化，以通过用模式填充外部票证的数量来显示与记者来源的关系。

稍后我还想包括门票的优先级，但除了添加预测，传奇和其他人之外，这将是接下来的步骤之一。

这是我到目前为止所做的：

p <- ggplot(data=table) + stat_bin( aes(x=factor(in.cw), y=..count.., fill = factor(external)))  
p +  stat_bin(data=table, aes(x=factor(fixed.cw), y=..count..))#, fill = factor(external))

我认为从一开始就不错了。）

你能告诉我如何让第二层包含填充颜色的fixed.cw票据，并告诉我如何为它定义线型？

这里使用的数据转储：

> dput(table)
structure(list(id = 1:16, external = c("x", "", "x", "", "x", 
"", "x", "", "x", "", "x", "", "x", "", "x", ""), in.date = c("01.11.2013", 
"07.11.2013", "15.11.2013", "01.11.2013", "07.11.2013", "15.11.2013", 
"01.11.2013", "07.11.2013", "15.11.2013", "01.11.2013", "07.11.2013", 
"15.11.2013", "01.11.2013", "07.11.2013", "15.11.2013", "01.11.2013"
), fixed = c("15.11.2013", "", "", "15.11.2013", "20.11.2014", 
"", "", "05.01.2014", "05.01.2014", "05.01.2014", "", "", "01.03.2014", 
"01.04.2014", "", "01.05.2014"), in.cw = c("1344", "1345", "1346", 
"1344", "1345", "1346", "1344", "1345", "1346", "1344", "1345", 
"1346", "1344", "1345", "1346", "1344"), fixed.cw = c("1346", 
NA, NA, "1346", "1447", NA, NA, "1402", "1402", "1402", NA, NA, 
"1409", "1414", NA, "1418")), .Names = c("id", "external", "in.date", 
"fixed", "in.cw", "fixed.cw"), row.names = c(NA, -16L), class = "data.frame")

这只是我创建的测试数据。

'external'标记客户创建的条目。

'in.date'定义了创建日期。

'fixed'定义关闭错误报告的日期。

'in.cw'和'fixed.cw'表示创建/关闭报告的年份和callendar周。

首先，我尝试创建一个图表，概述报告与已结算报告。理想情况下，通过与其他条目分开的外部通道。另外，基于in.cw和fixed.cw值的预测会很好。问候 Wasili

的问候，
Wasili

Answer 1

我仍然不知道您正在寻找什么样的组织，但这应该让您开始并假设您想在绘制之前汇总数据。

library(ggplot2)
library(plyr)

 test<-structure(list(id = 1:16, external = c("x", "", "x", "", "x", 
 "", "x", "", "x", "", "x", "", "x", "", "x", ""), in.date = c("01.11.2013", 
 "07.11.2013", "15.11.2013", "01.11.2013", "07.11.2013", "15.11.2013", 
 "01.11.2013", "07.11.2013", "15.11.2013", "01.11.2013", "07.11.2013", 
 "15.11.2013", "01.11.2013", "07.11.2013", "15.11.2013", "01.11.2013"
 ), fixed = c("15.11.2013", "", "", "15.11.2013", "20.11.2014", 
 "", "", "05.01.2014", "05.01.2014", "05.01.2014", "", "", "01.03.2014", 
 "01.04.2014", "", "01.05.2014"), in.cw = c("1344", "1345", "1346", 
 "1344", "1345", "1346", "1344", "1345", "1346", "1344", "1345", 
 "1346", "1344", "1345", "1346", "1344"), fixed.cw = c("1346", 
 NA, NA, "1346", "1447", NA, NA, "1402", "1402", "1402", NA, NA, 
 "1409", "1414", NA, "1418")), .Names = c("id", "external", "in.date", 
 "fixed", "in.cw", "fixed.cw"), row.names = c(NA, -16L), class = "data.frame")

## code external/interlal variable
test$origin<-ifelse(test$external=='x','external','internal')

## store dates as actual date objects
test$in.date<-as.Date(test$in.date,format='%d.%m.%Y')
test$fixed<-as.Date(test$fixed,format='%d.%m.%Y')

## calculate process time in days for completed records
test$fixtime<-test$fixed-test$in.date

## discretize process time into groups for summary purposes
test$fixtime_categories<-cut(as.numeric(test$fixtime),breaks=c(seq(1,100,40),Inf))


## summarize data by categorized process time and whether origin=external
summary_data <- ddply(test,
                      .(fixtime_categories,origin), summarise, 
                      records = length(id)) 

## plotting
ggplot(summary_data)+ 
  geom_bar(aes(x=fixtime_categories,y=records),stat="identity") +#,position="fill") +
  facet_wrap(~origin)+
  ggtitle('Process time by (exernal) filing status')

这产生了下面的图表，显示了完成它们需要多长时间的案例数量（这里，NA那些已经完成;那些可以被遗漏或包括在内，具体取决于用例）。左侧面板仅为外部情况;右侧面板内部的。 enter image description here

ggplot2堆积条形图

1 个答案: