我通常需要将大型(〜700MB)的csv文件上传到我的闪亮应用中。问题是,它在不到3秒左右的时间内显示“上传完成”,而实际上只需要20秒左右(通过打印一些数据行也已确认)。
是否有解决方法?
ui <- fluidPage(
titlePanel("Predictive Models"),
# Sidebar layout with input and output definitions ----
sidebarLayout(
# Sidebar panel for inputs ----
sidebarPanel(
# Input: Select a file ----
fileInput("file1", "Choose CSV File",
multiple = FALSE,
accept = c("text/csv",
"text/comma-separated-values,text/plain",
".csv"),
width = "80%")
...
server <- function(input, output) {
values <- reactiveValues(df_data = NULL, station_id= NULL, station_name= NULL, station_data=NULL, processed_data=NULL,df=NULL)
observeEvent(input$file1, {
values$df_data <- read.csv(input$file1$datapath);
output$sum <- renderPrint({
print(head(values$df_data, 10))
})
})
答案 0 :(得分:2)
上传文件有2个步骤。
我们在fileInput上看到的上传栏仅测量将文件上传到服务器以及进入temp目录的时间。没有时间将其读入内存。
由于read.csv()
会阻塞服务器直到操作完成,因此衡量将文件读入内存的时间的唯一方法是分批读取文件。在每个步骤中,我们使用Progress
记录进度。
这是一个示例,它不是最有效的代码。
library(shiny)
ui <- fluidPage(
titlePanel("Predictive Models"),
# Sidebar layout with input and output definitions ----
sidebarLayout(
# Sidebar panel for inputs ----
sidebarPanel(
# Input: Select a file ----
fileInput("file1", "Choose CSV File",
multiple = FALSE,
accept = c("text/csv",
"text/comma-separated-values,text/plain",
".csv"),
width = "80%")
),
mainPanel(verbatimTextOutput("sum"))
)
)
server <- function(input, output,session) {
options(shiny.maxRequestSize=800*1024^2)
read_batch_with_progress = function(file_path,nrows,no_batches){
progress = Progress$new(session, min = 1,max = no_batches)
progress$set(message = "Reading ...")
seq_length = ceiling(seq.int(from = 2, to = nrows-2,length.out = no_batches+1))
seq_length = seq_length[-length(seq_length)]
#read the first line
df = read.csv(file_path,skip = 0,nrows = 1)
col_names = colnames(df)
for(i in seq_along(seq_length)){
progress$set(value = i)
if(i == no_batches) chunk_size = -1 else chunk_size = seq_length[i+1] - seq_length[i]
df_temp = read.csv(file_path, skip = seq_length[i], nrows = chunk_size,header = FALSE,stringsAsFactors = FALSE)
colnames(df_temp) = col_names
df = rbind(df,df_temp)
}
progress$close()
return(df)
}
df = reactive({
req(input$file1)
n_rows = length(count.fields(input$file1$datapath))
df_out = read_batch_with_progress(input$file1$datapath,n_rows,10)
return(df_out)
})
observe({
output$sum <- renderPrint({
print(head(df(), 10))
})
})
}
shinyApp(ui,server)
此代码将文件拆分为10个块,并将每个块读取到内存中,并将其附加到前一个块中。在每个步骤中,它都使用progress$set(value = i)