我以格式
提供数据data <-
structure(list(Well_N = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L), .Label = c("KRT3", "KRT4"), class = "factor"), Date_m = structure(c(16251,
16281, 16312, 16343, 16373, 16312, 16343, 16373, 16404), class = "Date"),
QOM = c(132, 36, 39, 211, 45, 108, 161, 30, 31
)), class = "data.frame", row.names = c(NA, -9L), .Names = c("Well_N",
"Date_m", "QOM"))
data
输出如下:
Well_N Date_m QOM
1 KRT3 2014-06-30 132
2 KRT3 2014-07-30 36
3 KRT3 2014-08-30 39
4 KRT3 2014-09-30 211
5 KRT3 2014-10-30 45
6 KRT4 2014-08-30 108
7 KRT4 2014-09-30 161
8 KRT4 2014-10-30 30
9 KRT4 2014-11-30 31
如果我想用零(0)为每个Well具有相同日期范围的KRT4填充不存在的值,我应该使用哪个函数? 所需的输出应如下所示:
Well_N Date_m QOM
1 KRT3 2014-06-30 132
2 KRT3 2014-07-30 36
3 KRT3 2014-08-30 39
4 KRT3 2014-09-30 211
5 KRT3 2014-10-30 45
6 KRT3 2014-11-30 0
7 KRT4 2014-06-30 0
8 KRT4 2014-07-30 0
9 KRT4 2014-08-30 108
10 KRT4 2014-09-30 161
11 KRT4 2014-10-30 30
12 KRT4 2014-11-30 31
由于
答案 0 :(得分:3)
一种选择是使用data.table
。我的理解是,如果&#34; Date_m&#34;在一个或所有组中都缺少(&#34; well_N&#34;),那么预期的输出应该缺少&#34; Date_m&#34;在&#34; QOM&#34;为0.转换&#34; data.frame&#34;到&#34; data.table&#34; (setDT
),将关键列(setkey
)设置为&#34; Date_m&#34;和&#34; Well_N&#34;。使用min
到max
的序列和&#34; Well_N&#34;的唯一值进行交叉连接。分配&#34; 0&#34;对那些&#34; NA&#34;为&#34; QOM&#34;并按照&#34; Well_N&#34;。
library(data.table)
setkey(setDT(data), Date_m, Well_N)[
CJ(Date_m=seq(min(Date_m), max(Date_m), by='1 month'),
Well_N=unique(Well_N))][is.na(QOM), QOM:=0][order(Well_N)]
# Well_N Date_m QOM
# 1: KRT3 2014-06-30 132
# 2: KRT3 2014-07-30 36
# 3: KRT3 2014-08-30 39
# 4: KRT3 2014-09-30 211
# 5: KRT3 2014-10-30 45
# 6: KRT3 2014-11-30 0
# 7: KRT4 2014-06-30 0
# 8: KRT4 2014-07-30 0
# 9: KRT4 2014-08-30 108
#10: KRT4 2014-09-30 161
#11: KRT4 2014-10-30 30
#12: KRT4 2014-11-30 31
如果&#34; Well_N&#34;的所有组都有共同的缺失日期(&#34; Date_m&#34;)如果输出不应包括范围内的那些日期,我们可以重塑为&#34; wide&#34;,然后转换为&#34; long&#34;
melt(dcast.data.table(setDT(data), Well_N~Date_m, value.var='QOM',
drop=FALSE, fill=0), id='Well_N')[order(Well_N)]
或者使用第一个解决方案的修改,我们将seq(
替换为unique(Date_m)
setkey(setDT(data), Date_m, Well_N)[CJ(Date_m=unique(Date_m),
Well_N=unique(Well_N))][is.na(QOM), QOM:=0][order(Well_N)]
答案 1 :(得分:2)
使用xtabs
:
as.data.frame(xtabs(QOM ~ Well_N + Date_m, data))
# Well_N Date_m Freq
#1 KRT3 2014-06-30 132
#2 KRT4 2014-06-30 0
#3 KRT3 2014-07-30 36
#4 KRT4 2014-07-30 0
#5 KRT3 2014-08-30 39
#6 KRT4 2014-08-30 108
#7 KRT3 2014-09-30 211
#8 KRT4 2014-09-30 161
#9 KRT3 2014-10-30 45
#10 KRT4 2014-10-30 30
#11 KRT3 2014-11-30 0
#12 KRT4 2014-11-30 31
您只需要使用?order
重新排序数据。
或者您甚至可以在没有order
的情况下执行此操作:
as.data.frame(xtabs(QOM ~ Date_m + Well_N, data))[c(2,1,3)]
# Well_N Date_m Freq
#1 KRT3 2014-06-30 132
#2 KRT3 2014-07-30 36
#3 KRT3 2014-08-30 39
#4 KRT3 2014-09-30 211
#5 KRT3 2014-10-30 45
#6 KRT3 2014-11-30 0
#7 KRT4 2014-06-30 0
#8 KRT4 2014-07-30 0
#9 KRT4 2014-08-30 108
#10 KRT4 2014-09-30 161
#11 KRT4 2014-10-30 30
#12 KRT4 2014-11-30 31
由于看起来他们要求的是不同的东西,这里是如何在基地R中完成的(我使用&#34; testdata&#34;而不是&#34;数据&#34;这里):
testdata <- merge(expand.grid(Date_m = seq(min(testdata$Date_m), max(testdata$Date_m),
by = "1 month"), Well_N = unique(testdata$Well_N)),
testdata, by = c("Date_m", "Well_N"), all.x = TRUE)
testdata$QOM[is.na(testdata$QOM)] <- 0