Question

我使用所示格式的数据： Actual data set much longer. Column labels are: Date | Variable 1 | Variable 2 | Failed ?

我将数据排序为日期顺序。某些日期可能会丢失，但是订购功能应该将其整理出来。从那里开始，我试图将数据拆分成集合，其中新集合由最右边的列表示注册1.我然后尝试在单个图表上绘制这些集合，并传递天数x轴。我已经研究过使用ggplot函数，但它似乎需要每个向量的长度已知的帧。我尝试根据所有集合的最大天数创建一个长度矩阵，然后填充要绘制NaN值的备用单元格，但这需要很长时间，因为我的数据集非常大。我想知道是否有一种更优雅的方法可以在一个图表上为所有集合绘制过去几天的值，然后迭代该过程以获取其他变量。
任何帮助将非常感激。

此处包含可重复示例的代码：

test <-matrix(c(
"01/03/1997",   0.521583294,    0.315170092,    0,
"02/03/1997",   0.63946859, 0.270870821,    0,
"03/03/1997",   0.698687101,    0.253495021,    0,
"04/03/1997",   0.828754157,    0.233024574,    0,
"05/03/1997",   0.87078867, 0.214507537,    0,
"06/03/1997",   0.883279874,    0.212268627,    0,
"07/03/1997",   0.952083969,    0.062663598,    0,
"08/03/1997",   0.991100195,    0.054875256,    0,
"09/03/1997",   0.992490126,    0.026610776,    1,
"10/03/1997",   0.020707391,    0.866874513,    0,
"11/03/1997",   0.32405139, 0.778696984,    0,
"12/03/1997",   0.32665243, 0.703234151,    0,
"13/03/1997",   0.603941956,    0.362869647,    0,
"14/03/1997",   0.944046386,    0.026992527,    1,
"15/03/1997",   0.108246142,    0.939363715,    0,
"16/03/1997",   0.152195386,    0.907458966,    0,
"17/03/1997",   0.285748169,    0.765212667,    0), ncol = 4, byrow=TRUE)
colnames(test) <- c("Date", "Variable 1", "Variable 2", "Failed")
test <-as.table(test)
test

Answer 1

我设法将解决方案混合在一起，但看起来非常混乱。我确信有一种更优雅的解决方法。

z = as.data.frame.matrix(test)
attach(z) 

x = as.numeric(as.character(Failed))
x = cumsum(x) #Variable names recycled

更正的累积失败总和将数据放入先前失败次数组

z <- within(z, acc_sum <- x)
attach(z)
z$acc_sum <- as.numeric(as.character(z$acc_sum))-as.numeric(as.character(z$Failed)) 
attach(z)

z = data.frame(z, Group_Index=ave(acc_sum==acc_sum,acc_sum,FUN=cumsum)

创建一个额外的行，其中包含自测量开始以来经过的天数。读取代码以保留新的变量名称比直接保持索引更容易。

attach(z) 
x = (max(acc_sum)+1) #This is the number of sets of variable results

当前列读取：日期|变量1 |变量2 |失败| acc_sum | Group_Index

library(ggplot2)

n = data.frame(acc_sum, Group_Index)

这会初始化框架并使其更快，因此Group_Index和acc_sum每次都不会被读入。

for(j in 1:(ncol(z)-4)){    #This iterates through all the variables to generate a new set of lists. -4 is from removing date, failed, Group_index and acc_sum
n$Variable <- z[,(j+1)] #This reads in the new variable data, but requires the variables to all be next to each other    
n[] <- lapply(n,function(x)as.numeric(as.character(x))) #This ensures all the values are numeric for plotting

plot <- ggplot(n, aes(x = Group_Index, y = Variable, colour = acc_sum)) +
    theme_bw() +
    geom_line(aes(group=acc_sum))   #linetype = "dotted"
print(plot) #This ensures that the graph is presented in every iteration

cat ("Press [enter] to continue")   #This waits for a user input before moving to the next variable
    line <- readline()
}

可以改进图表以使实际变量名称随着绘制的内容而变化。这可以通过在ylabel循环中包含for来完成。

然后分裂，将不均匀的矢量长度绘制成单个图形

1 个答案: