我对R很新并且有一个问题,希望有人会知道答案。
我有一个(小样本)数据集,如下所示:
structure(list(ID = 1:6, Date_Start = structure(c(7305, 7702,
12114, 9714, 11819, 8418), class = "Date"), Date_stop = structure(c(10957,
11355, 15767, 10111, 17422, 12949), class = "Date"), Start_Range_1 = structure(c(7731,
8068, 13304, 9863, NA, NA), class = "Date"), Stop_Range_1 = structure(c(8129,
8070, 13308, 9867, NA, NA), class = "Date"), Days_Range_1 = structure(c(3L,
2L, 4L, 4L, 1L, 1L), .Label = c("#VALUE!", "2", "398", "4"), class = "factor"),
Start_Range_2 = structure(c(8170, 8566, 13878, NA, 11853,
NA), class = "Date"), Stop_Range_2 = structure(c(8202, 8593,
13893, NA, 12072, NA), class = "Date"), Days_Range_2 = structure(c(5L,
4L, 2L, 1L, 3L, 1L), .Label = c("#VALUE!", "15", "219", "27",
"32"), class = "factor"), Start_Range_3 = structure(c(9544,
9869, 14029, NA, NA, NA), class = "Date"), Stop_Range_3 = structure(c(9573,
9870, 14042, NA, NA, NA), class = "Date"), Days_Range_3 = structure(c(4L,
2L, 3L, 1L, 1L, 1L), .Label = c("#VALUE!", "1", "13", "29"
), class = "factor"), Start_Range_4 = structure(c(9993, 10267,
14621, NA, NA, NA), class = "Date"), Stop_Range_4 = structure(c(9996,
10272, 14625, NA, NA, NA), class = "Date"), Days_Range_4 = structure(c(2L,
4L, 3L, 1L, 1L, 1L), .Label = c("#VALUE!", "3", "4", "5"), class = "factor")), .Names = c("ID",
"Date_Start", "Date_stop", "Start_Range_1", "Stop_Range_1", "Days_Range_1",
"Start_Range_2", "Stop_Range_2", "Days_Range_2", "Start_Range_3",
"Stop_Range_3", "Days_Range_3", "Start_Range_4", "Stop_Range_4",
"Days_Range_4"), row.names = c(NA, -6L), class = "data.frame")
'data.frame': 6 obs. of 15 variables:
$ ID : int 1 2 3 4 5 6
$ Date_Start : Date, format: "1990-01-01" "1991-02-02" ...
$ Date_stop : Date, format: "2000-01-01" "2001-02-02" ...
$ Start_Range_1: Date, format: "1991-03-03" "1992-02-03" ...
$ Stop_Range_1 : Date, format: "1992-04-04" "1992-02-05" ...
$ Days_Range_1 : Factor w/ 4 levels "#VALUE!","2",..: 3 2 4 4 1 1
$ Start_Range_2: Date, format: "1992-05-15" "1993-06-15" ...
$ Stop_Range_2 : Date, format: "1992-06-16" "1993-07-12" ...
$ Days_Range_2 : Factor w/ 5 levels "#VALUE!","15",..: 5 4 2 1 3 1
$ Start_Range_3: Date, format: "1996-02-18" "1997-01-08" ...
$ Stop_Range_3 : Date, format: "1996-03-18" "1997-01-09" ...
$ Days_Range_3 : Factor w/ 4 levels "#VALUE!","1",..: 4 2 3 1 1 1
$ Start_Range_4: Date, format: "1997-05-12" "1998-02-10" ...
$ Stop_Range_4 : Date, format: "1997-05-15" "1998-02-15" ...
$ Days_Range_4 : Factor w/ 4 levels "#VALUE!","3",..: 2 4 3 1 1 1
我想为每个唯一ID_number生成Start_date和End_date之间所有季度的列(第1季度仍然在95-100左右,但每个ID号不同)。
除此之外,我想知道每个日期范围(start_range_1,stop_range_1,days_range_1)的天数是否属于这些季度(从每个唯一ID的Start_date开始)。有时范围可能会超过2个季度,如果确实如此,我想知道该范围的第一个季度内有多少天,以及该范围的第二个季度有多少天。所以我需要所有日期范围的每个季度的总天数(总共30个)。
输出最好是这样的:
希望有人可以帮助我!非常感谢提前