我在data.frame中有一个列,其中包含以下字符串格式的日期(与月度,季度和年度数据相关):
"2008Q1", "2008M1", "2008M2", "2008M3", "2008Q2", "2008M4", "2008M5",
"2008M6", "2008Q3", "2008M7", "2008M8", "2008M9", "2008Q4", "2008M10",
"2008M11", "2008M12", "2009", "2009Q1", "2009M1", "2009M2", "2009M3",
"2009Q2", "2009M4", "2009M5", "2009M6", "2009Q3", "2009M7", "2009M8",
"2009M9", "2009Q4", "2009M10", "2009M11", "2009M12", "2010"
是否有任何优雅而快速的解决方案(data.frame非常大)将其转换为两个独立的列,包含频率和日期,如下所示:
DFreq Date
Quarterly 1/3/2008
Monthly 1/1/2008
Monthly 1/2/2008
Monthly 1/3/2008
...
Monthly 1/12/2008
Annual 1/12/2009
答案 0 :(得分:3)
可以使用一点正则表达式提取频率,并且可以将字符串解析为anytime::anydate
的日期(对于缺少的日期组件插入“01”),但它会将所有非年份数字解析为月份,所以需要一点清理。在tidyverse语法中,
library(tidyverse)
library(lubridate)
df <- data_frame(date = c("2008Q1", "2008M1", "2008M2", "2008M3", "2008Q2", "2008M4", "2008M5",
"2008M6", "2008Q3", "2008M7", "2008M8", "2008M9", "2008Q4", "2008M10",
"2008M11", "2008M12", "2009", "2009Q1", "2009M1", "2009M2", "2009M3",
"2009Q2", "2009M4", "2009M5", "2009M6", "2009Q3", "2009M7", "2009M8",
"2009M9", "2009Q4", "2009M10", "2009M11", "2009M12", "2010"))
df %>%
mutate(frequency = recode(gsub('\\d', '', date), # remove all numbers...
'M' = 'Monthly', ...and recode as words
'Q' = 'Quarterly',
.default = 'Annually'),
date = anytime::anydate(date), # parse to year-month
date = {month(date) <- month(date) * recode(frequency, # ...and correct the month
'Annually' = 12,
'Quarterly' = 3,
.default = 1);
date})
#> # A tibble: 34 x 2
#> date frequency
#> <date> <chr>
#> 1 2008-03-01 Quarterly
#> 2 2008-01-01 Monthly
#> 3 2008-02-01 Monthly
#> 4 2008-03-01 Monthly
#> 5 2008-06-01 Quarterly
#> 6 2008-04-01 Monthly
#> 7 2008-05-01 Monthly
#> 8 2008-06-01 Monthly
#> 9 2008-09-01 Quarterly
#> 10 2008-07-01 Monthly
#> # ... with 24 more rows
此方法适当调整以便更改季度和年度数据,以便日期与该期间的最后一个月的第一天对齐,正如问题中的期望结果所做的那样。通常,存储期间的第一天实际上更有用,您可以通过利用lubridate::parse_date_time
的极端多功能性为混合格式构建适当的解析器来获得:
df %>%
mutate(frequency = recode(gsub('\\d', '', date),
'M' = 'Monthly',
'Q' = 'Quarterly',
.default = 'Annually'),
date = as_date(parse_date_time(
date,
c('Ym', 'Yq', 'Y'), # possible formats
select_formats = function(dates){ # function to determine format
recode(gsub('\\%.[a-z]?', '', names(dates)),
'M' = '%YM%m',
'Q' = '%YQ%q',
.default = '%Y')
})))
#> # A tibble: 34 x 2
#> date frequency
#> <date> <chr>
#> 1 2008-01-01 Quarterly
#> 2 2008-01-01 Monthly
#> 3 2008-02-01 Monthly
#> 4 2008-03-01 Monthly
#> 5 2008-04-01 Quarterly
#> 6 2008-04-01 Monthly
#> 7 2008-05-01 Monthly
#> 8 2008-06-01 Monthly
#> 9 2008-07-01 Quarterly
#> 10 2008-07-01 Monthly
#> # ... with 24 more rows
答案 1 :(得分:1)
我不会说出效率,但它完成了工作。
Unit: milliseconds
expr min lq mean median uq max neval
benjamin 432.43466 433.31058 439.30987 439.20125 444.05267 448.95130 10
pogibas 665.64618 718.50771 734.78987 745.73741 747.14000 767.26852 10
alistaire 16.85593 17.13333 17.35033 17.31104 17.52041 17.92627 10
因为我很好奇,所以我通过 var arr = [{"id":1,"name":"Mike"},{"id":2,"name":"Tom"},{"id":3,"name":"Herman"},{"id":4,"name":"Ursula"},{"id":5,"name":"Sam"},{"id":6,"name":"Jenny"},{"id":7,"name":"Helga"},{"id":8,"name":"Nikolas"},{"id":9,"name":"Surgen"},{"id":10,"name":"Jorg"}];
var table_str='<table id="tblResultsList" border="1"></table>';
$('#rstSearch').append(table_str);
var index=0;
var index_total=0;
var row_str='';
for(key in arr){
index++;
index_total++;
if(index==1){
row_str='<tr>';
};
row_str+='<td><input data-id="'+arr[key].id+'" class="my-btn" type="button" value="'+arr[key].name+'"></td>';
if(index==4){
row_str+='</tr>';
index=0;
$('#tblResultsList').append(row_str);
row_str='';
}
if(arr.length==index_total){
row_str+='</tr>';
$('#tblResultsList').append(row_str);
}
};
var btn_max_width = 0;
$('.my-btn').each(function(){
var test_width=$(this).outerWidth(true);
btn_max_width = Math.max(btn_max_width, test_width);
});
$('.my-btn').css({'width':btn_max_width, 'height':btn_max_width});
$('.my-btn').click(function(){
var id=$(this).attr('data-id');
alert(id);
});
完成了这些操作并提出了
<script type="text/javascript">
$(document).ready(function(){
$("#placeTable").on('click', 'button.remove', function(e){
var id = $(this).data('id');
bootbox.confirm("Are you sure you want to remove?", function(result) {
if (result) {
$.post('/removelocation', {lid : id}, function(){
$('#placeTable tr[data-id="'+ id +'"]').remove();
});
}
});
});
});
</script>
所以我要说@ alistaire的答案。
答案 2 :(得分:1)
类似于Benjamin's的解决方案(我使用convertDate <- function(x) {
DFreq <- "Annual"
Date <- paste0("1/12/", x)
foo <- unlist(strsplit(x, "[A-Z]"))
if (length(grep("Q", x)) == 1) {
DFreq <- "Quarterly"
Date <- paste0("1/", as.numeric(foo[2]) * 3, "/", foo[1])
} else if (length(grep("M", x)) == 1) {
DFreq <- "Monthly"
Date <- paste0("1/", foo[2], "/", foo[1])
}
return(data.frame(DFreq, Date))
}
INPUT <- c("2008M5", "2009Q3", "2011")
res <- sapply(INPUT, convertDate, simplify = FALSE)
do.call("rbind", res)
DFreq Date
2008M5 Monthly 1/5/2008
2009Q3 Quarterly 1/9/2009
2011 Annual 1/12/2011
grep Quarters或Months)和-
获得想要的格式。
name="field-name"