如何在R

时间:2016-01-29 19:23:34

标签: r plot graphics decision-tree binary-decision-diagram

让我们假设一群人在时间和3个时间点被跟踪,他们被问到是否愿意成为法官。在此期间,他们将改变他们的意见。我希望以图形方式显示意见的变化,以便在时间内成为判断/不判断。 以下是如何展示它的想法:

enter image description here

以下是阅读情节的方法:

  • 1,462名学生被抽样,(400 + 295 + 22 + 147)他们想成为法官(第一批上行)。
  • 蓝色路径意味着他们最终会成为法官。
  • 黑色路径意味着最后他们做了别的事。
  • 线路上升:他们想成为法官。
  • 线路下降:他们不想成为法官。
  • 线条的粗细与经过此特定路径的人数成正比(=在路径末尾绘制的数字)。

例如:
(a)118人在高中和大学期间不想成为法官,但在练习期间他们决定成为法官。
(b)直到练习695决定成为法官,但在练习400成为法官后,295做了其他事情。

主要思想是探索哪种决策路径存在以及哪种决策路径最常用。

我有几个问题:

  1. 这种图表有名称吗?
  2. 是否已有可以绘制此图的R函数?
  3. 如果没有R功能:任何想法我怎么能画这个更漂亮?例如:(3.1)我希望曲线相邻(曲线之间没有间隙,没有重叠)。 (3.2)曲线的起点和终点应与y轴平行。
  4. 有什么建议吗?

    编辑1:
    我发现了一个类似于上图的情节:riverplot,例如,参见R library riverplotR blogger。河流图的缺点是在交叉点处各个线程或路径都会丢失。

    以下是数据:

    library(reshape2)
    library(ggplot2)
    
    # Data
    wide <- data.frame(  grp        = 1:8,
                        time1_orig = rep(8,8)
                      , time2_orig = rep(c(4,12), each = 4)
                      , time3_orig = rep(c(2,6,10,14), each = 2)
                      , time4_orig = seq(1,15,2)
                      , n           = c(409,118,38,33,147,22,295,400)  # number of persion
                      , d           = c(1,0,1,0,1,0,1,0)               # decision
                      )
    
    wide
      grp time1_orig time2_orig time3_orig time4_orig   n d
    1   1          8          4          2          1 409 1
    2   2          8          4          2          3 118 0
    3   3          8          4          6          5  38 1
    4   4          8          4          6          7  33 0
    5   5          8         12         10          9 147 1
    6   6          8         12         10         11  22 0
    7   7          8         12         14         13 295 1
    8   8          8         12         14         15 400 0
    

    以下是数据转换以获得情节:

    w <- 500
    wide$time1 <- wide$time1_orig + (cumsum(wide$n)-(wide$n)/2)/w
    wide$time2 <- wide$time2_orig + (cumsum(wide$n)-(wide$n)/2)/w
    wide$time3 <- wide$time3_orig + (cumsum(wide$n)-(wide$n)/2)/w
    wide$time4 <- wide$time4_orig + (cumsum(wide$n)-(wide$n)/2)/w
    
    
    long<- melt(wide[,-c(2:5)], id = c("d","grp","n"))
    long$d<-as.character(long$d)
    str(long)
    

    这是ggplot:

    gg1 <- ggplot(long, aes(x=variable, y=value, group=grp, colour=d)) +
              geom_line (aes(size=n),position=position_dodge(height=c(0.5))) +
              geom_text(aes(label=c( "1462",""   ,""   ,""   ,""   ,""   ,""   ,""
                                    ,""    ,""   ,"598",""   ,""   ,"864",""   ,""
                                    ,"527" ,""   ,""   ,"71" ,"169",""   ,""   ,"695"
                                ,"409" ,"118","38" ,"33" ,"147","22" ,"295","400"
                                )
                            , size = 300, vjust= -1.5)
                        ) +
               scale_colour_manual(name="",labels=c("Yes", "No"),values=c("royalblue","black")) +
               theme(legend.position = c(0,1),legend.justification = c(0, 1),
                     legend.text = element_text( size=12),
                     axis.text = element_text( size=12),
                     axis.title = element_text( size=15),
                     plot.title = element_text( size=15)) +
               guides(lwd="none") +
               labs(x="", y="Consider a judge career as an option:") +
               scale_y_discrete(labels="") +
               scale_x_discrete(labels = c(  "during high school"
                                           , "during university"
                                           , "during practice"
                                           , ""
                                        )
                                    )
    gg1
    

1 个答案:

答案 0 :(得分:1)

我找到了一个解决方案,感谢图书馆riverplot给了我这个情节:

enter image description here

以下是代码:

library("riverplot")
# Create nodes
nodes <- data.frame(  ID     = paste(rep(c("O","C","R","D"),c(1,2,4,8)),c(1,1:2,1:4,1:8),sep="")
                    , x      = rep(0:3, c(1,2,4,8)) 
                    , y      = c(8, 12,4,14, 10,6,2, 15,13,11,9,7,5,3,1)
                    , labels = c("1462","864","598","695","169","71","527","400","295","22","147","33","38","118","409")
                    , col    = rep("lightblue", 15)
                    , stringsAsFactors= FALSE
                    )
# Create edges
edges <- data.frame(  N1 = paste(rep(c("O","C","R"), c(2,4,8)), rep(c(1,2,1,4:1)  , each=2), sep="")
                    , N2 = paste(rep(c("C","R","D"), c(2,4,8)), c(c(2:1,4:1,8:1)), sep="")
                    )

edges$Value   <- as.numeric(nodes$labels[2:15])
edges$col     <- NA
edges$col     <- rep(c("black","royalblue"), 7)
edges$edgecol <- "col"

# Create nodes/edges object
river <- makeRiver(nodes, edges)

# Define styles
style <-default.style()
style[["edgestyle"]]<-"straight"

# Plot
plot(river, default_style= style, srt=0, nsteps=200, nodewidth = 3)

# Add label
names <- data.frame (Time = c(" ", "during high school", "during university", "during practive")
                     ,hi  = c(0,0,0,0)
                     ,wi  = c(0,1,2,3)
                     )
with( names, text( wi, hi, Time) )

可以选择绘制一系列分类信息:
TraMineR - Mining sequence data

  

TraMineR:用于探索序列数据的工具箱
  TraMineR是一个R-package,用于挖掘,描述和可视化状态或事件序列,以及更一般的离散顺序数据。其主要目的是分析社会科学中的传记纵向数据,例如描述职业或家庭轨迹的数据。然而,它的大多数特征也适用于非时间数据,例如文本或DNA序列