Pandas DataFrame / HDFStore通过CSV传递多个日期格式

时间:2017-01-10 15:26:31

标签: python datetime pandas dataframe

我正在执行以下操作以在不同列中传递多个日期。但是,第二列(时间)列不符合此字符串,因此它有错误。我如何实现这一目标?

 dateparse = lambda x: pd.datetime.strptime(x, '%d/%m/%Y %H:%M:%S')

 for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['date','time'], parse_dates = dateparse, names = col_names, index_col = index_cols, header = 0, dtype = dtype)
        store.append('df',chunk)

示例数据:

 Date                     Time
19/10/2016 00:00:00      00:05:01

2 个答案:

答案 0 :(得分:2)

如果您使用'19/10/2016 00:00:00'这样的标准格式,则无需指定日期时间格式 - Pandas会自动解析它,因此您不需要使用date_parser参数。

选项1:Time列转换为datetime64[ns] dtype:

for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['Date'], names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Time'] = chunk['Date'].dt.normalize() + pd.to_timedelta(chunk['Time'])
    store.append('df',chunk)

选项2 :将Time列转换为timedelta64[ns] dtype:

for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['Date'], names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Time'] = pd.to_timedelta(chunk['Time'])
    store.append('df',chunk)

PS HDFStore支持两种dtypes

选项3:

for chunk in pd.read_csv(file, chunksize=500000, names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Date'] = pd.to_datetime(chunk['Date'], errors='coerce')
    chunk['Time'] = pd.to_timedelta(chunk['Time'], errors='coerce')
    store.append('df',chunk)

答案 1 :(得分:1)

您可以告诉Pandas将日期和时间列合并为一列,方法是传递一个列表列表,而不仅仅是Filter2Filter3中指定的列表:

  

parse_dates :boolean或整数或名称列表或列表或dict列表,默认为False

     
      
  • 布尔值。如果为True - >尝试解析索引。
  •   
  • 整体或名称列表。例如如果[1,2,3] - >尝试将第1,2,3列分别解析为单独的日期列。
  •   
  • 列表清单。例如如果[[1,3]] - >将第1列和第3列组合在一起并解析为   单个日期列。   dict,例如{'foo':[1,3]} - >将第1,3列解析为日期并调用结果'foo'
  •   

您还希望根据日期格式指定library(shiny) library(dplyr) library(DT) ui <- fluidPage( titlePanel("Title"), sidebarLayout( sidebarPanel(width=3, selectInput("filter1", "Filter 1", multiple = TRUE, choices = c("All", LETTERS)), selectInput("filter2", "Filter 2", multiple = TRUE, choices = c("All", as.character(seq.int(1, length(letters), 1)))), selectInput("filter3", "Filter 3", multiple = TRUE, choices = c("All", letters)) ), mainPanel( DT::dataTableOutput("tableprint") ) ) ) server <- function(input, output, session) { output$tableprint <- DT::renderDataTable({ # Data df <- tibble(LETTERS = rep(LETTERS, 2), Numbers = as.character(1:52), letters = paste(LETTERS, Numbers, sep = "")) df1 <- df if("All" %in% input$filter1){ df1 } else if (length(input$filter1)){ df1 <- df1[which(df1$LETTERS %in% input$filter1),] } # Update selectInput choices based on the filtered data. Update 'selected' to reflect the user input. updateSelectInput(session, "filter1", choices = c("All", df$LETTERS), selected = input$filter1) updateSelectInput(session, "filter2", choices = c("All", df1$Numbers), selected = input$filter2) if("All" %in% input$filter2){ df1 } else if (length(input$filter2)){ df1 <- df1[which(df1$Numbers %in% input$filter2),] } updateSelectInput(session, "filter3", choices = c("All", df1$letters), selected = input$filter3) if("All" %in% input$filter3){ df1 } else if (length(input$filter3)){ df1 <- df1[which(df1$letters %in% input$filter3),] } datatable(df1) }) } # Run the application shinyApp(ui = ui, server = server)

这意味着您的代码变为

parse_dates