如何格式化散点图的数据

时间:2019-08-07 13:17:36

标签: r plotly r-plotly sunburst-diagram

我正在尝试通过R使用Plotly制作森伯斯特图。我正在为层次结构所需的数据模型苦苦挣扎,无论是在概念上如何工作,以及是否有任何简单的方法来转换常规数据框,其中的列表示不同的层次结构,格式为所需的格式。

我看过R中的阴阳图的示例,例如here,并看到了reference page,但并没有完全获得用于数据格式化的模型。

# Create some fake data - say ownership and land use data with acreage
df <- data.frame(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)), 
                 landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                 acres=c(108,143,102, 300,320,500, 37,58,90))

# Just try some quick pie charts of acreage by landuse and ownership
plot_ly(data=df, labels= ~landuse, values= ~acres, type='pie')
plot_ly(data=df, labels= ~ownership, values= ~acres, type='pie')

# This doesn't render anything... not that I'd expect it to given the data format doesn't seem to match what's needed, 
# but this is what I'd intuitively expect to work
plot_ly(data=df, labels= ~landuse, parents = ~ownership, values= ~acres, type='sunburst')

鉴于上面的示例代码或类似代码,了解数据如何从数据(df变为可绘制的旭日形图所需的格式会很有帮助。

2 个答案:

答案 0 :(得分:2)

有专门用于此任务的 plotme 包:

library(plotme)
library(dplyr)

df %>% 
  rename(n = acres) %>% 
  count_to_sunburst()

enter image description here

要安装软件包,请运行:

devtools::install_github("yogevherz/plotme")

关于 here 包的更多信息。

答案 1 :(得分:1)

与plotly的R API的其他直观用法相比,您绝对正确,为森伯斯特图表准备数据非常烦人。

我遇到了同样的问题,并基于library(data.table)编写了一个函数来准备数据,接受两种不同的data.frame输入格式。

在{strong>带有重复标签的旭日形物部分下的here中可以看到使用与您的结构相似的数据生成旭日形图所需的格式。

对于您的示例,它应如下所示:

         labels values         parents                           ids
 1:       total   1658            <NA>                         total
 2:     private    353           total               total - private
 3:      public   1120           total                total - public
 4:       mixed    185           total                 total - mixed
 5: residential    108 total - private total - private - residential
 6:  recreation    143 total - private  total - private - recreation
 7:  commercial    102 total - private  total - private - commercial
 8: residential    300  total - public  total - public - residential
 9:  recreation    320  total - public   total - public - recreation
10:  commercial    500  total - public   total - public - commercial
11: residential     37   total - mixed   total - mixed - residential
12:  recreation     58   total - mixed    total - mixed - recreation
13:  commercial     90   total - mixed    total - mixed - commercial

这是到达那里的代码:

library(data.table)
library(plotly)

DF <- data.table(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)),
                  landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                  acres=c(108, 143, 102, 300, 320, 500, 37, 58, 90))

as.sunburstDF <- function(DF, valueCol = NULL){
  require(data.table)

  DT <- data.table(DF, stringsAsFactors = FALSE)
  DT[, root := "total"]
  setcolorder(DT, c("root", names(DF)))

  hierarchyList <- list()
  if(!is.null(valueCol)){setnames(DT, valueCol, "values", skip_absent=TRUE)}
  hierarchyCols <- setdiff(names(DT), "values")

  for(i in seq_along(hierarchyCols)){
    currentCols <- names(DT)[1:i]
    if(is.null(valueCol)){
      currentDT <- unique(DT[, ..currentCols][, values := .N, by = currentCols], by = currentCols)
    } else {
      currentDT <- DT[, lapply(.SD, sum, na.rm = TRUE), by=currentCols, .SDcols = "values"]
    }
    setnames(currentDT, length(currentCols), "labels")
    hierarchyList[[i]] <- currentDT
  }

  hierarchyDT <- rbindlist(hierarchyList, use.names = TRUE, fill = TRUE)

  parentCols <- setdiff(names(hierarchyDT), c("labels", "values", valueCol))
  hierarchyDT[, parents := apply(.SD, 1, function(x){fifelse(all(is.na(x)), yes = NA_character_, no = paste(x[!is.na(x)], sep = ":", collapse = " - "))}), .SDcols = parentCols]
  hierarchyDT[, ids := apply(.SD, 1, function(x){paste(x[!is.na(x)], collapse = " - ")}), .SDcols = c("parents", "labels")]
  hierarchyDT[, c(parentCols) := NULL]
  return(hierarchyDT)
}

sunburstDF <- as.sunburstDF(DF, valueCol = "acres")

plot_ly(data = sunburstDF, ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

以下是函数接受的第二种data.frame格式的示例(valueCol = NULL,因为它是根据数据计算得出的):

DF2 <- data.frame(sample(LETTERS[1:3], 100, replace = TRUE),
                 sample(LETTERS[4:6], 100, replace = TRUE),
                 sample(LETTERS[7:9], 100, replace = TRUE),
                 sample(LETTERS[10:12], 100, replace = TRUE),
                 sample(LETTERS[13:15], 100, replace = TRUE),
                 stringsAsFactors = FALSE)

plot_ly(data = as.sunburstDF(DF2), ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

另请参阅库(sunburstR)。