通过R

时间:2018-04-19 00:47:37

标签: r dataframe

我有下表:

Name        Date       Quiz    Homework   
John      11-01-02      40        10
John      11-01-03      47        20
John      11-01-04      41        10
John      11-01-08      35        10
John      11-01-10      43        15
John      11-01-13      40        10
Adam      11-01-05      41        10
Adam      11-01-08      41        15
Adam      11-01-14      49        10
Adam      11-01-19      40        20
Adam      11-01-21      40        10

你可以看到有一些时间差距。我想按名称填写这些时间差距,并将那些缺失日期的测验和作业分数替换为零。因此,我想要的最终结果将是以下

Name        Date       Quiz    Homework   
John      11-01-02      40        10
John      11-01-03      47        20
John      11-01-04      41        10
John      11-01-05      0          0
John      11-01-06      0          0
John      11-01-07      0          0
John      11-01-08      35        10
John      11-01-09      0          0
John      11-01-10      43        15
John      11-01-11      0          0
John      11-01-12      0          0
John      11-01-13      40        10
Adam      11-01-05      41        10
Adam      11-01-06      0          0
Adam      11-01-07      0          0
Adam      11-01-08      41        15
Adam      11-01-09      0          0
Adam      11-01-10      0          0
Adam      11-01-11      0          0
Adam      11-01-12      0          0
Adam      11-01-13      0          0
Adam      11-01-14      49        10
Adam      11-01-15      0          0
Adam      11-01-16      0          0
Adam      11-01-17      0          0
Adam      11-01-18      0          0
Adam      11-01-19      40        20
Adam      11-01-20      0          0
Adam      11-01-21      40        10

有没有快速的方法呢?我做的是以下内容:

1) Find a minimum, maximum dates by name
2) For each name, create a sequence of dates from minimum, maximum dates found in step 1)
3) Join the table created in step 2) with the original table. 
4) replace NA values in Quiz, Homework by zero

但这很慢。我想知道是否有快速的方法。

2 个答案:

答案 0 :(得分:1)

使用data.table包的解决方案应该很快:

library(data.table)

DT <- fread("Name        Date       Quiz    Homework   
John      11-01-02      40        10
John      11-01-03      47        20
John      11-01-04      41        10
John      11-01-08      35        10
John      11-01-10      43        15
John      11-01-13      40        10
Adam      11-01-05      41        10
Adam      11-01-08      41        15
Adam      11-01-14      49        10
Adam      11-01-19      40        20
Adam      11-01-21      40        10")
DT[, Date := as.Date(Date, "%y-%m-%d")]

DT[DT[, .(Date=seq(min(Date), max(Date), by="1 day")), by=.(Name)],
    on=.(Name, Date)][,
        ':=' (
            Quiz = ifelse(is.na(Quiz), 0, Quiz),
            Homework = ifelse(is.na(Homework), 0, Homework)
        )]

说明:

  1. 使用allDates <- DT[, .(Date=seq(min(Date), max(Date), by="1 day")), by=.(Name)]
  2. 创建日期序列
  3. 加入 使用DT[allDates, on=.(Name, Date)]
  4. 的原始数据集
  5. 最后,用0
  6. 替换NA

答案 1 :(得分:1)

tidyverse解决方案:

library(dplyr)
library(tidyr)
library(lubridate) # for easier year conversion

df1 <- structure(list(Name = c("John", "John", "John", "John", "John", 
                               "John", "Adam", "Adam", "Adam", "Adam", "Adam"), 
                      Date = c("11-01-02", "11-01-03", "11-01-04", 
                               "11-01-08", "11-01-10", "11-01-13", 
                               "11-01-05", "11-01-08", "11-01-14", 
                               "11-01-19", "11-01-21"), 
                      Quiz = c(40L, 47L, 41L, 35L, 43L, 40L, 41L, 41L, 49L, 40L, 40L), 
                      Homework = c(10L, 20L, 10L, 10L, 15L, 10L, 
                                   10L, 15L, 10L, 20L, 10L)), 
                      .Names = c("Name", "Date", "Quiz", "Homework"), 
                      class = "data.frame", 
                      row.names = c(NA, -11L))

df1 %>% 
  mutate(Date = as_date(Date, "%C-%m-%d")) %>% 
  group_by(Name) %>% 
  complete(Date = seq(min(Date), max(Date), by = "1 day"), 
           fill = list(Quiz = 0, Homework = 0))

   Name       Date Quiz Homework
1  Adam 2011-01-05   41       10
2  Adam 2011-01-06    0        0
3  Adam 2011-01-07    0        0
4  Adam 2011-01-08   41       15
5  Adam 2011-01-09    0        0
6  Adam 2011-01-10    0        0
7  Adam 2011-01-11    0        0
8  Adam 2011-01-12    0        0
9  Adam 2011-01-13    0        0
10 Adam 2011-01-14   49       10
11 Adam 2011-01-15    0        0
12 Adam 2011-01-16    0        0
13 Adam 2011-01-17    0        0
14 Adam 2011-01-18    0        0
15 Adam 2011-01-19   40       20
16 Adam 2011-01-20    0        0
17 Adam 2011-01-21   40       10
18 John 2011-01-02   40       10
19 John 2011-01-03   47       20
20 John 2011-01-04   41       10
21 John 2011-01-05    0        0
22 John 2011-01-06    0        0
23 John 2011-01-07    0        0
24 John 2011-01-08   35       10
25 John 2011-01-09    0        0
26 John 2011-01-10   43       15
27 John 2011-01-11    0        0
28 John 2011-01-12    0        0
29 John 2011-01-13   40       10