将日期与R中的文本分开

时间:2017-08-15 21:59:01

标签: r date

我有一个字符串向量,其中包含在站点收集的变量的开始日期和结束日期的重复模式。这是第一个条目:

" 1942-10-06,1996-03-31Snow Depth(in / mm)1942-11-01,1996-03-31Snowfall(in / mm)1942-10-01,1997-12- 27生长度天数加热度天数平均温度(F / C)最高温度(F / C)1950-08-01,1970-03-31观察时间温度(F / C)1942-10-01,1997-12-27最低温度(F) / C)1942-10-01,1996-03-31沉淀(in / mm)"

有人可以帮助我将每个字符串重新格式化为包含开始日期,结束日期和变量名称的表吗?

1 个答案:

答案 0 :(得分:2)

以下代码应该遵循关于数据格式化方式的一些假设:

  1. 您的开始日期是" yyyy-mm-dd"或" yyyy-dd-mm"格式和 接着是逗号,
  2. 您的结束日期与开始日期的格式相同,然后按照 一个逗号,和
  3. 您的变量名称遵循结束日期并且包含否 号。
  4. 正如Oriol Mirosa所暗示,这些假设可能不成立。

    # Your string
    string = "1942-10-06,1996-03-31Snow Depth (in/mm)1942-11-01,1996-03-31Snowfall (in/mm)1942-10-01,1997-12-27Growing Degree DaysHeating Degree DaysAverage Temperature (F/C)Maximum Temperature (F/C)1950-08-01,1970-03-31Observation Time Temperature (F/C)1942-10-01,1997-12-27Minimum Temperature (F/C)1942-10-01,1996-03-31Precipitation (in/mm)"
    
    # Extract text matching Assumptions 1-3, respectively, above
    library(stringr) 
    start_dates = str_extract_all(string, "[0-9]{4}-[0-9]{2}-[0-9]{2},")
    end_dates = str_extract_all(string, ",[0-9]{4}-[0-9]{2}-[0-9]{2}")
    var_names = str_extract_all(string, 
                               ",[0-9]{4}-[0-9]{2}-[0-9]{2}([^[0-9]])+")
    
    # Remove the irrelevant bits (e.g., leading/trailing commas)
    start_dates = as.Date(gsub(",", "", unlist(start_dates))) #remove ","
    end_dates = as.Date(gsub(",", "", unlist(end_dates))) #remove ","
    var_names = gsub(",[0-9]{4}-[0-9]{2}-[0-9]{2}", "", unlist(var_names))
    
    # Put into table
    X = data.frame("Start_date" = start_dates, 
                   "End_date" = end_dates,
                   "Var_name" = var_names)