在R中将字符串转换为日期

时间:2017-09-22 14:58:06

标签: r datetime

我在data.frame中有一个列,其中包含以下字符串格式的日期(与月度,季度和年度数据相关):

"2008Q1", "2008M1", "2008M2", "2008M3", "2008Q2", "2008M4", "2008M5", 
"2008M6", "2008Q3", "2008M7", "2008M8", "2008M9", "2008Q4", "2008M10", 
"2008M11", "2008M12", "2009", "2009Q1", "2009M1", "2009M2", "2009M3", 
"2009Q2", "2009M4", "2009M5", "2009M6", "2009Q3", "2009M7", "2009M8", 
"2009M9", "2009Q4", "2009M10", "2009M11", "2009M12", "2010"

是否有任何优雅而快速的解决方案(data.frame非常大)将其转换为两个独立的列,包含频率和日期,如下所示:

DFreq       Date
Quarterly   1/3/2008
Monthly     1/1/2008
Monthly     1/2/2008
Monthly     1/3/2008
...
Monthly     1/12/2008
Annual      1/12/2009

3 个答案:

答案 0 :(得分:3)

可以使用一点正则表达式提取频率,并且可以将字符串解析为anytime::anydate的日期(对于缺少的日期组件插入“01”),但它会将所有非年份数字解析为月份,所以需要一点清理。在tidyverse语法中,

library(tidyverse)
library(lubridate)

df <- data_frame(date = c("2008Q1", "2008M1", "2008M2", "2008M3", "2008Q2", "2008M4", "2008M5", 
                          "2008M6", "2008Q3", "2008M7", "2008M8", "2008M9", "2008Q4", "2008M10", 
                          "2008M11", "2008M12", "2009", "2009Q1", "2009M1", "2009M2", "2009M3", 
                          "2009Q2", "2009M4", "2009M5", "2009M6", "2009Q3", "2009M7", "2009M8", 
                          "2009M9", "2009Q4", "2009M10", "2009M11", "2009M12", "2010"))

df %>% 
    mutate(frequency = recode(gsub('\\d', '', date),    # remove all numbers...
                              'M' = 'Monthly',    ...and recode as words
                              'Q' = 'Quarterly', 
                              .default = 'Annually'),
           date = anytime::anydate(date),    # parse to year-month
           date = {month(date) <- month(date) * recode(frequency,    # ...and correct the month
                                                       'Annually' = 12, 
                                                       'Quarterly' = 3, 
                                                       .default = 1); 
                   date})
#> # A tibble: 34 x 2
#>          date frequency
#>        <date>     <chr>
#>  1 2008-03-01 Quarterly
#>  2 2008-01-01   Monthly
#>  3 2008-02-01   Monthly
#>  4 2008-03-01   Monthly
#>  5 2008-06-01 Quarterly
#>  6 2008-04-01   Monthly
#>  7 2008-05-01   Monthly
#>  8 2008-06-01   Monthly
#>  9 2008-09-01 Quarterly
#> 10 2008-07-01   Monthly
#> # ... with 24 more rows

此方法适当调整以便更改季度和年度数据,以便日期与该期间的最后一个月的第一天对齐,正如问题中的期望结果所做的那样。通常,存储期间的第一天实际上更有用,您可以通过利用lubridate::parse_date_time的极端多功能性为混合格式构建适当的解析器来获得:

df %>% 
    mutate(frequency = recode(gsub('\\d', '', date),
                              'M' = 'Monthly', 
                              'Q' = 'Quarterly', 
                              .default = 'Annually'),
           date = as_date(parse_date_time(
               date, 
               c('Ym', 'Yq', 'Y'),    # possible formats
               select_formats = function(dates){    # function to determine format
                   recode(gsub('\\%.[a-z]?', '', names(dates)), 
                          'M' = '%YM%m', 
                          'Q' = '%YQ%q', 
                          .default = '%Y')
               })))
#> # A tibble: 34 x 2
#>          date frequency
#>        <date>     <chr>
#>  1 2008-01-01 Quarterly
#>  2 2008-01-01   Monthly
#>  3 2008-02-01   Monthly
#>  4 2008-03-01   Monthly
#>  5 2008-04-01 Quarterly
#>  6 2008-04-01   Monthly
#>  7 2008-05-01   Monthly
#>  8 2008-06-01   Monthly
#>  9 2008-07-01 Quarterly
#> 10 2008-07-01   Monthly
#> # ... with 24 more rows

答案 1 :(得分:1)

我不会说出效率,但它完成了工作。

Unit: milliseconds
      expr       min        lq      mean    median        uq       max neval
  benjamin 432.43466 433.31058 439.30987 439.20125 444.05267 448.95130    10
   pogibas 665.64618 718.50771 734.78987 745.73741 747.14000 767.26852    10
 alistaire  16.85593  17.13333  17.35033  17.31104  17.52041  17.92627    10

效率

因为我很好奇,所以我通过 var arr = [{"id":1,"name":"Mike"},{"id":2,"name":"Tom"},{"id":3,"name":"Herman"},{"id":4,"name":"Ursula"},{"id":5,"name":"Sam"},{"id":6,"name":"Jenny"},{"id":7,"name":"Helga"},{"id":8,"name":"Nikolas"},{"id":9,"name":"Surgen"},{"id":10,"name":"Jorg"}]; var table_str='<table id="tblResultsList" border="1"></table>'; $('#rstSearch').append(table_str); var index=0; var index_total=0; var row_str=''; for(key in arr){ index++; index_total++; if(index==1){ row_str='<tr>'; }; row_str+='<td><input data-id="'+arr[key].id+'" class="my-btn" type="button" value="'+arr[key].name+'"></td>'; if(index==4){ row_str+='</tr>'; index=0; $('#tblResultsList').append(row_str); row_str=''; } if(arr.length==index_total){ row_str+='</tr>'; $('#tblResultsList').append(row_str); } }; var btn_max_width = 0; $('.my-btn').each(function(){ var test_width=$(this).outerWidth(true); btn_max_width = Math.max(btn_max_width, test_width); }); $('.my-btn').css({'width':btn_max_width, 'height':btn_max_width}); $('.my-btn').click(function(){ var id=$(this).attr('data-id'); alert(id); }); 完成了这些操作并提出了

<script type="text/javascript">
    $(document).ready(function(){
        $("#placeTable").on('click', 'button.remove', function(e){
            var id = $(this).data('id');
            bootbox.confirm("Are you sure you want to remove?", function(result) {
                if (result) {
                    $.post('/removelocation', {lid : id}, function(){
                        $('#placeTable tr[data-id="'+ id +'"]').remove();
                    });
                }
            });
        });
    });
</script>

所以我要说@ alistaire的答案。

答案 2 :(得分:1)

类似于Benjamin's的解决方案(我使用convertDate <- function(x) { DFreq <- "Annual" Date <- paste0("1/12/", x) foo <- unlist(strsplit(x, "[A-Z]")) if (length(grep("Q", x)) == 1) { DFreq <- "Quarterly" Date <- paste0("1/", as.numeric(foo[2]) * 3, "/", foo[1]) } else if (length(grep("M", x)) == 1) { DFreq <- "Monthly" Date <- paste0("1/", foo[2], "/", foo[1]) } return(data.frame(DFreq, Date)) } INPUT <- c("2008M5", "2009Q3", "2011") res <- sapply(INPUT, convertDate, simplify = FALSE) do.call("rbind", res) DFreq Date 2008M5 Monthly 1/5/2008 2009Q3 Quarterly 1/9/2009 2011 Annual 1/12/2011 grep Quarters或Months)和-获得想要的格式。

name="field-name"