Question

目前，我正在为具有大型数据集（1.5 GB CSV，已压缩为150 MB RDS）的客户开发Shiny App。每当用户更改输入时，我都会遇到麻烦，这似乎是最慢的步骤，每次更改都会执行数据导入。这是一个最小的示例（该应用程序有点复杂，但问题是相同的。）

UI.R（R Studio的基本示例，这里没有任何关系，只是选择输入和ggplot）：

library(shiny)

# Define UI for application that draws a histogram
shinyUI(fluidPage(

  # Application title
  titlePanel("Old Faithful Geyser Data"),

  # Sidebar with a slider input for number of bins 
  sidebarLayout(
    sidebarPanel(
      selectInput("select_z", "Z Value", selected = 387.5,c(seq(293.5,443.5,1)))
    ),

    # Show a plot of the generated distribution
    mainPanel(
       plotOutput("distPlot")
    )
  )
))

Server.R（服务器功能之外的readRDS语句，以及简单的dplyr过滤器）

library(shiny)
library(dplyr)
library(magrittr)
library(ggplot2)

data <- readRDS('./data.rds')

# Define server logic required to draw a histogram
shinyServer(function(input, output) {

  output$distPlot <- renderPlot({

    # generate bins based on input$bins from ui.R
    filtered_data <- data %>% filter(Z == input$select_z)

    # draw the histogram with the specified number of bins
    ggplot(filtered_data)+
      geom_histogram(aes(X))

  })

})

初始加载大约需要10秒钟（正常情况下），但是问题出在每次用户更改输入时。

我在非反应性环境中测试了相同的设置，并且时间要快得多，这表明唯一的约束是数据导入，其余操作不到一秒钟。

system.time(readRDS('./data.rds'))
   user  system elapsed 
  3.121   0.396   3.524 
> system.time(filtered_data <- data %>% filter(Z == 384.5))
   user  system elapsed 
  0.048   0.011   0.059 
> system.time(ggplot(filtered_data)+geom_histogram(aes(X)))
   user  system elapsed 
  0.001   0.000   0.001

我认为问题是因为每次输入更改时都会执行数据导入语句，但是我还没有找到阻止这种情况发生的方法。

谢谢

Answer 1

理想情况下，您不需要将如此大的文件加载到内存中，而不必使用数据库，而应查看rstudio网站上的these存储选项。
使用debounce可能会改善用户互动，其中selectInput会在触发之前延迟一定的时间

shinyServer(function(input, output,session) {

  selection <- reactive({
    input$select_z
  })

  # add a delay of 1 sec
  selected_z <- selection %>% debounce(1000)

  output$distPlot <- renderPlot({
    # generate bins based on input$bins from ui.R
    filtered_data <- data %>% filter(Z == selected_z())

    # draw the histogram with the specified number of bins
    ggplot(filtered_data)+
      geom_histogram(aes(X))
  })
})

对于大型数据集，闪亮的App计算非常慢

1 个答案: