创建平衡数据集

时间:2018-02-17 10:51:12

标签: r panel

我正在使用R并拥有如下所示的长数据集:

Date           ID     Status
2014-10-01     12      1
2015-04-01     12      1
2015-07-01     12      1
2015-09-01     12      1
2015-11-01     12      0
2016-01-01     12      0
2016-05-01     12      0
2016-08-01     12      1
2017-03-01     12      1
2017-05-01     12      1
2014-10-01     13      1
2015-04-01     13      1
2015-07-01     13      0
2015-11-01     14      0
2016-01-01     14      0
...

我的目标是创建一个“平衡”数据,即每个ID应该出现在10个日期中的每一个。最初未发生的观测值的变量“Status”应标记为N / A.换句话说,结果应如下所示:

Date           ID     Status
2014-10-01     12      1
2015-04-01     12      1
2015-07-01     12      1
2015-09-01     12      1
2015-11-01     12      0
2016-01-01     12      0
2016-05-01     12      0
2016-08-01     12      1
2017-03-01     12      1
2017-05-01     12      1
2014-10-01     13      1
2015-04-01     13      1
2015-07-01     13      N/A
2015-09-01     13      N/A
2015-11-01     13      N/A
2016-01-01     13      N/A
2016-05-01     13      N/A
2016-08-01     13      N/A
2017-03-01     13      N/A
2017-05-01     13      N/A
2014-10-01     14      N/A
2015-04-01     14      N/A
2015-07-01     14      N/A
2015-09-01     14      N/A
2015-11-01     14      0
2016-01-01     14      0
2016-05-01     14      N/A
2016-08-01     14      N/A
2017-03-01     14      N/A
2017-05-01     14      N/A
...

感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

以下是使用tidyverse的方法:

library(tidyverse)
df %>%
 group_by(ID) %>%
 expand(Date) %>% #in each id expand the dates
 left_join(df) -> df1 #join the original data frame and save to object df1

或保存到原始对象(感谢Renu的评论):

df %<>%
 group_by(ID) %>%
 expand(Date) %>% #in each id expand the dates
 left_join(df)

相当于:

df %>%
 group_by(ID) %>%
 expand(Date) %>% #in each id expand the dates
 left_join(df) -> df

结果:

   ID       Date Status
1  12 2014-10-01      1
2  12 2015-04-01      1
3  12 2015-07-01      1
4  12 2015-09-01      1
5  12 2015-11-01      0
6  12 2016-01-01      0
7  12 2016-05-01      0
8  12 2016-08-01      1
9  12 2017-03-01      1
10 12 2017-05-01      1
11 13 2014-10-01      1
12 13 2015-04-01      1
13 13 2015-07-01      0
14 13 2015-09-01     NA
15 13 2015-11-01     NA
16 13 2016-01-01     NA
17 13 2016-05-01     NA
18 13 2016-08-01     NA
19 13 2017-03-01     NA
20 13 2017-05-01     NA
21 14 2014-10-01     NA
22 14 2015-04-01     NA
23 14 2015-07-01     NA
24 14 2015-09-01     NA
25 14 2015-11-01      0
26 14 2016-01-01      0
27 14 2016-05-01     NA
28 14 2016-08-01     NA
29 14 2017-03-01     NA
30 14 2017-05-01     NA

数据:

> dput(df)
structure(list(Date = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 1L, 2L, 3L, 5L, 6L), .Label = c("2014-10-01", "2015-04-01", 
"2015-07-01", "2015-09-01", "2015-11-01", "2016-01-01", "2016-05-01", 
"2016-08-01", "2017-03-01", "2017-05-01"), class = "factor"), 
    ID = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
    13L, 13L, 13L, 14L, 14L), Status = c(1L, 1L, 1L, 1L, 0L, 
    0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L)), .Names = c("Date", 
"ID", "Status"), class = "data.frame", row.names = c(NA, -15L
))

答案 1 :(得分:0)

以下对我有用:

void main() {
  *((volatile unsigned char *)(0x27)) = 128;
  volatile unsigned char * x = (unsigned char *) 301;
  volatile unsigned char * y = (unsigned char *) 302;
  volatile unsigned char * z = (unsigned char *) 303;
start:
  *((volatile unsigned char *)(0x28)) = 0;
  for (*x = 0; *x < 255; (*x)++) {
  for (*y = 0; *y < 255; (*y)++) {
  for (*z = 0; *z < 255; (*z)++) {
  }
  }
  }
  *((volatile unsigned char *)(0x28)) = 128;
  for (*x = 0; *x < 255; (*x)++) {
  for (*y = 0; *y < 255; (*y)++) {
  for (*z = 0; *z < 255; (*z)++) {
  }
  }
  }
  goto start;
}