用零填充数据框中的空(NA)记录

时间:2015-02-06 10:02:46

标签: r

我以格式

提供数据
data <-
structure(list(Well_N = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L), .Label = c("KRT3", "KRT4"), class = "factor"), Date_m = structure(c(16251, 
16281, 16312, 16343, 16373, 16312, 16343, 16373, 16404), class = "Date"), 
QOM = c(132, 36, 39, 211, 45, 108, 161, 30, 31
)), class = "data.frame", row.names = c(NA, -9L), .Names = c("Well_N", 
"Date_m", "QOM"))

data输出如下:

  Well_N     Date_m QOM
1   KRT3 2014-06-30 132
2   KRT3 2014-07-30  36
3   KRT3 2014-08-30  39
4   KRT3 2014-09-30 211
5   KRT3 2014-10-30  45
6   KRT4 2014-08-30 108
7   KRT4 2014-09-30 161
8   KRT4 2014-10-30  30
9   KRT4 2014-11-30  31

如果我想用零(0)为每个Well具有相同日期范围的KRT4填充不存在的值,我应该使用哪个函数? 所需的输出应如下所示:

  Well_N     Date_m QOM
1   KRT3 2014-06-30 132
2   KRT3 2014-07-30  36
3   KRT3 2014-08-30  39
4   KRT3 2014-09-30 211
5   KRT3 2014-10-30  45
6   KRT3 2014-11-30   0
7   KRT4 2014-06-30   0
8   KRT4 2014-07-30   0
9   KRT4 2014-08-30 108
10  KRT4 2014-09-30 161
11  KRT4 2014-10-30  30
12  KRT4 2014-11-30  31

由于

2 个答案:

答案 0 :(得分:3)

一种选择是使用data.table。我的理解是,如果&#34; Date_m&#34;在一个或所有组中都缺少(&#34; well_N&#34;),那么预期的输出应该缺少&#34; Date_m&#34;在&#34; QOM&#34;为0.转换&#34; data.frame&#34;到&#34; data.table&#34; (setDT),将关键列(setkey)设置为&#34; Date_m&#34;和&#34; Well_N&#34;。使用minmax的序列和&#34; Well_N&#34;的唯一值进行交叉连接。分配&#34; 0&#34;对那些&#34; NA&#34;为&#34; QOM&#34;并按照&#34; Well_N&#34;。

排序
library(data.table)
setkey(setDT(data), Date_m, Well_N)[
     CJ(Date_m=seq(min(Date_m), max(Date_m), by='1 month'), 
     Well_N=unique(Well_N))][is.na(QOM), QOM:=0][order(Well_N)]
 #    Well_N     Date_m QOM
 # 1:   KRT3 2014-06-30 132
 # 2:   KRT3 2014-07-30  36
 # 3:   KRT3 2014-08-30  39
 # 4:   KRT3 2014-09-30 211
 # 5:   KRT3 2014-10-30  45
 # 6:   KRT3 2014-11-30   0
 # 7:   KRT4 2014-06-30   0
 # 8:   KRT4 2014-07-30   0
 # 9:   KRT4 2014-08-30 108
 #10:   KRT4 2014-09-30 161
 #11:   KRT4 2014-10-30  30
 #12:   KRT4 2014-11-30  31

如果&#34; Well_N&#34;的所有组都有共同的缺失日期(&#34; Date_m&#34;)如果输出不应包括范围内的那些日期,我们可以重塑为&#34; wide&#34;,然后转换为&#34; long&#34;

  melt(dcast.data.table(setDT(data), Well_N~Date_m, value.var='QOM',
            drop=FALSE, fill=0), id='Well_N')[order(Well_N)]

或者使用第一个解决方案的修改,我们将seq(替换为unique(Date_m)

  setkey(setDT(data), Date_m, Well_N)[CJ(Date_m=unique(Date_m), 
       Well_N=unique(Well_N))][is.na(QOM), QOM:=0][order(Well_N)]

答案 1 :(得分:2)

使用xtabs

在基础R中很容易
as.data.frame(xtabs(QOM ~ Well_N + Date_m, data))
#   Well_N     Date_m Freq
#1    KRT3 2014-06-30  132
#2    KRT4 2014-06-30    0
#3    KRT3 2014-07-30   36
#4    KRT4 2014-07-30    0
#5    KRT3 2014-08-30   39
#6    KRT4 2014-08-30  108
#7    KRT3 2014-09-30  211
#8    KRT4 2014-09-30  161
#9    KRT3 2014-10-30   45
#10   KRT4 2014-10-30   30
#11   KRT3 2014-11-30    0
#12   KRT4 2014-11-30   31

您只需要使用?order重新排序数据。


或者您甚至可以在没有order的情况下执行此操作:

as.data.frame(xtabs(QOM ~ Date_m + Well_N, data))[c(2,1,3)]
#   Well_N     Date_m Freq
#1    KRT3 2014-06-30  132
#2    KRT3 2014-07-30   36
#3    KRT3 2014-08-30   39
#4    KRT3 2014-09-30  211
#5    KRT3 2014-10-30   45
#6    KRT3 2014-11-30    0
#7    KRT4 2014-06-30    0
#8    KRT4 2014-07-30    0
#9    KRT4 2014-08-30  108
#10   KRT4 2014-09-30  161
#11   KRT4 2014-10-30   30
#12   KRT4 2014-11-30   31

由于看起来他们要求的是不同的东西,这里是如何在基地R中完成的(我使用&#34; testdata&#34;而不是&#34;数据&#34;这里):

testdata <- merge(expand.grid(Date_m = seq(min(testdata$Date_m), max(testdata$Date_m), 
                by = "1 month"), Well_N = unique(testdata$Well_N)), 
                testdata, by = c("Date_m", "Well_N"), all.x = TRUE)
testdata$QOM[is.na(testdata$QOM)] <- 0