从未格式化的日期字符向量中提取年份

时间:2018-04-10 14:41:31

标签: r date dplyr data-munging

我有一个字符向量,表示未格式化日期的覆盖年份,它是这样的:

     Period of coverage
1    1/1/2011 to 31/12/2011
2    1/1/2010 to 31/12/2010
3    1/1/2012 to 31/12/2012
4    1/1/2010 to 31/12/2010
5    1/1/2011 to 31/12/2011
6    1/1/2012 to 31/12/2012
7    1/1/2010 to 31/12/2010
8    1/1/2010 to 31/12/2010
9    1/1/2009 to 31/12/2009

我想知道如何将列转换为每个观察所代表的年份。每一行都有相同的开始日期和结束日期(1/1和31/12)。

3 个答案:

答案 0 :(得分:1)

假设您的数据存储在变量WITH test (id, start_at, place_id, recurring_schedule) AS ( VALUES (358, '2015-01-23 20:00:00 +0000'::TIMESTAMPTZ, 412, '{"validations":{"day":[2]},"rule_type":"IceCube::WeeklyRule","interval":1,"week_start":0}'::JSONB), (359, '2016-01-22 19:30:00 +1100', 414, '{"validations":{"day":[1]},"rule_type":"IceCube::WeeklyRule","interval":1,"week_start":0}'), (360, '2016-02-01 19:00:00 +1100', 415, '{"validations":{"day":[4]},"rule_type":"IceCube::WeeklyRule","interval":1,"week_start":0}'), (361, '2016-02-01 20:00:00 +0000', 416, '{"validations":{"day":[4]},"rule_type":"IceCube::WeeklyRule","interval":1,"week_start":0}'), (362, '2014-02-13 20:00:00 +0000', 417, '{"validations":{"day":[2]},"rule_type":"IceCube::WeeklyRule","interval":1,"week_start":0}') ) SELECT id, start_at, place_id, CASE recurring_schedule->>'rule_type' WHEN 'IceCube::WeeklyRule' THEN GENERATE_SERIES(start_at, NOW(), (recurring_schedule->>'interval' || ' WEEK')::INTERVAL) ELSE NULL END recurring_start_time FROM test; 中,并且所有日期的格式都没有改变,如您所述,

period

答案 1 :(得分:1)

假设最后在Note中重复显示DF,删除最后一个斜杠的所有内容并转换为数字:

transform(DF, year = as.numeric(sub(".*/", "", `Period of coverage`)), check.names = FALSE)

,并提供:

      Period of coverage year
1 1/1/2011 to 31/12/2011 2011
2 1/1/2010 to 31/12/2010 2010
3 1/1/2012 to 31/12/2012 2012
4 1/1/2010 to 31/12/2010 2010
5 1/1/2011 to 31/12/2011 2011
6 1/1/2012 to 31/12/2012 2012
7 1/1/2010 to 31/12/2010 2010
8 1/1/2010 to 31/12/2010 2010
9 1/1/2009 to 31/12/2009 2009

另一种可能性是首先将它转换为Date类,注意as.Date在最后忽略垃圾:

to_year <- function(x, fmt) as.numeric(format(as.Date(x, fmt), "%Y"))
transform(DF, year = to_year(`Period of coverage`, "%d/%m/%Y"), check.names = FALSE)

注意

Lines <- "     Period of coverage
1/1/2011 to 31/12/2011
1/1/2010 to 31/12/2010
1/1/2012 to 31/12/2012
1/1/2010 to 31/12/2010
1/1/2011 to 31/12/2011
1/1/2012 to 31/12/2012
1/1/2010 to 31/12/2010
1/1/2010 to 31/12/2010
1/1/2009 to 31/12/2009"
DF <- read.csv(text = Lines, check.names = FALSE, as.is = TRUE)

答案 2 :(得分:1)

如果您的字符串始终具有相同的格式,您只需使用子字符串并将其转换为日期:

    as.Date(substr("1/1/2011 to 31/12/2011",5,8), format="%Y") 
as.Date(substr("1/1/2011 to 31/12/2011",19,23), format="%Y")

如果字符串变量更大但总是被“to”拆分,则可以使用stringsplit取消列出字符串,然后将其格式化为年份:

a <- "1/1/2011 to 31/12/2011"
a2 <- strsplit(a, "to") ;
a3 <- unlist(a2) ;
a4 <- as.Date(a3, format="%d/%m/%Y")
year = format(a4, format="%Y")