如何在R中创建具有大量行的数据帧

时间:2016-10-30 09:36:25

标签: r performance dataframe

我想制作包含大量行的数据帧,以便从另一个数据中复制缺少数据的数据。

df.org
                  time    t    h  p   s
1  2016-10-30 10:10:00 33.6 21.3 NA STA
2  2016-10-30 10:50:00 33.7 19.8 NA STA
3  2016-10-30 11:00:00 33.7 18.4 NA STB
4  2016-10-30 11:10:00 34.3 19.3 NA STB
5  2016-10-30 11:20:00 33.9 19.4 NA STB
6  2016-10-30 11:30:00 34.4 20.9 NA STA
7  2016-10-30 11:40:00 34.8 21.1 NA STB
8  2016-10-30 11:50:00 34.6 21.2 NA STB
9  2016-10-30 12:00:00 34.6 22.1 NA STA
10 2016-10-30 12:10:00 34.9 20.8 NA STC
11 2016-10-30 12:20:00 34.9 21.7 NA STC
12 2016-10-30 12:30:00 35.0 21.9 NA STA
13 2016-10-30 12:50:00 35.1 22.6 NA STA

这就是我的预期。

df.wNA
                  time     t     h     p      s
1  2016-10-30 10:10:00  33.6  21.3    NA    STA
2  2016-10-30 10:20:00    NA    NA    NA     NA
3  2016-10-30 10:30:00    NA    NA    NA     NA
4  2016-10-30 10:40:00    NA    NA    NA     NA
5  2016-10-30 10:50:00  33.7  19.8    NA    STA
6  2016-10-30 11:00:00  33.7  18.4    NA    STB
7  2016-10-30 11:10:00  34.3  19.3    NA    STB
8  2016-10-30 11:20:00  33.9  19.4    NA    STB
9  2016-10-30 11:30:00  34.4  20.9    NA    STA
10 2016-10-30 11:40:00  34.8  21.1    NA    STB
11 2016-10-30 11:50:00  34.6  21.2    NA    STB
12 2016-10-30 12:00:00  34.6  22.1    NA    STA
13 2016-10-30 12:10:00  34.9  20.8    NA    STC
14 2016-10-30 12:20:00  34.9  21.7    NA    STC
15 2016-10-30 12:30:00  35.0  21.9    NA    STA
16 2016-10-30 12:40:00    NA    NA    NA     NA
17 2016-10-30 12:50:00  35.1  22.6    NA    STA

代码

time <- as.POSIXct(c("2016-10-30 10:10:00", "2016-10-30 10:50:00", "2016-10-30 11:00:00", "2016-10-30 11:10:00", "2016-10-30 11:20:00", "2016-10-30 11:30:00", "2016-10-30 11:40:00", "2016-10-30 11:50:00", "2016-10-30 12:00:00", "2016-10-30 12:10:00", "2016-10-30 12:20:00", "2016-10-30 12:30:00", "2016-10-30 12:50:00"))
t <- c( 33.6, 33.7, 33.7, 34.3, 33.9, 34.4, 34.8, 34.6, 34.6, 34.9, 34.9, 35.0, 35.1 )
h <- c( 21.3, 19.8, 18.4, 19.3, 19.4, 20.9, 21.1, 21.2, 22.1, 20.8, 21.7, 21.9, 22.6 )
p <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
s <- c( "STA", "STA", "STB", "STB", "STB", "STA", "STB", "STB", "STA", "STC", "STC", "STA", "STA" ) 

df.org <- data.frame(time, t, h, p, s)
fr <- min(df.org$time)
to <- max(df.org$time)
times <- as.POSIXct(seq(fr, to, by=60*10))  
df.wNA <- subset(df.org, FALSE)
for (jth in 1:length(times)) {
  ro <- as.data.frame(lapply(df.org[1, ], function(x) { rep(NA, length(x)) } ))
  ro$time <- times[jth]
  df.wNA <- bind_rows(df.wNA, ro)
}

df.wNA[pmatch(df.org$time, df.wNA$time, nomatch=0), ] <- df.org

但是在长度(次数)很大的情况下这太慢了。我怎么能加快这个?

由于

0 个答案:

没有答案