计算数据帧中的特定模式,在R中进行插值

时间:2016-07-11 11:03:53

标签: r dataframe interpolation

我有一个数据框格式的数据集,如下所示:

wpt    ID   Fuel  Express   Local
 1     S36   12      0         1
 2     S36   14      1         0
 inter S36   NA      1         0
 inter S36   NA      1         0
 3     S36   16      1         0
 inter S36   NA      0         1
 4     S36   18      1         0
 5     S36   22      1         0
 6     W09   45      0         1
 inter W09   NA      1         0
 7     W09   48      0         1

我想处理子数据dat [c(2,inter,inter,3),](任何部分用" inter"结合常规编号wpt )作为一个单位。

(1)计算我的数据框中有多少个这样的子单元,在这种情况下它有两个(单元行,从2到3,以及3到4)

(2)然后计算从子单元的起始值到结束值 Express或Local值一致的单位数。在这种情况下,它有1个这样的单位一致(第2行,第3行,所有Express )和1个单位不同(第3行到第4行,以Express开头和结尾,从这些单位的起始值或结束值开始,就是本地

(3)所有计算均按身份证明。

预期的输出是这样的:

ID   consistent    total
S36      1            2
W09      0            1

(4)如果我想在Fuel列中插入缺失值怎么办?做简单的线性插值。就像前两个NAs被14.66667和15.33333所取代,它们来自:

seq(14, 16, length.out=3)

这样的预期结果:

wpt    ID   Fuel    Express   Local
 1     S36   12        0         1
 2     S36   14        1         0
 inter S36   14.66667  1         0
 inter S36   15.33333  1         0
 3     S36   16        1         0
 inter S36   17        0         1
 4     S36   18        1         0
 5     S36   22        1         0
 6     W09   45        0         1
 inter W09   45.75     1         0
 inter W09   46.50     1         0
 inter W09   47.25     1         0
 7     W09   48        0         1

提前致谢!

3 个答案:

答案 0 :(得分:3)

subs <- with(rle(df$wpt),{
    ends <- cumsum(lengths);
    n <- grepl('^[0-9]+$',values);
    w <- which(head(n,-2L) & values[-c(1L,length(n))]=='inter' & tail(n,-2L));
    data.frame(start=c(0L,ends)[w]+1L,end=ends[w+2L]);
});
subs$ID <- df$ID[subs$start];
subs$consistent <- mapply(function(s,e,eq) all(eq[s:e]),subs$start,subs$end-1L,MoreArgs=list(diff(df$Express)==0L));
subs;
##   start end  ID consistent
## 1     2   5 S36       TRUE
## 2     5   7 S36      FALSE
## 3     9  11 W09      FALSE
res <- aggregate(cbind(consistent,total=rep(1L,length(ID)))~ID,subs,sum);
res;
##    ID consistent total
## 1 S36          1     2
## 2 W09          0     1

数据

df <- data.frame(wpt=c('1','2','inter','inter','3','inter','4','5','6','inter','7'),ID=c(
'S36','S36','S36','S36','S36','S36','S36','S36','W09','W09','W09'),Fuel=c(12L,14L,NA,NA,16L,
NA,18L,22L,45L,NA,48L),Express=c(0L,1L,1L,1L,1L,0L,1L,1L,0L,1L,0L),Local=c(1L,0L,0L,0L,0L,1L,
0L,0L,1L,0L,1L),stringsAsFactors=F);

答案 1 :(得分:1)

如果我的意图是正确的,我对使用data.tablerle的案例(1)的建议是

library(data.table)
dt <- fread("
       wpt    ID   Fuel  Express   Local
        1     S36   12      0         1
        2     S36   14      1         0
        inter S36   NA      1         0
        inter S36   NA      1         0
        3     S36   16      1         0
        inter S36   NA      0         1
        4     S36   18      1         0
        5     S36   22      1         0
        6     W09   45      0         1
        inter W09   NA      1         0
        7     W09   48      0         1")
rdt <- dt[, rle(wpt), by = ID]
rdt[values == "inter" & lengths >= 1, .(total = .N), by = ID]

导致

    ID total
1: S36     2
2: W09     1

答案 2 :(得分:1)

要计算模式,您可以使用rle。例如:

x <- rle(df$wpt == 'inter')
y <- which(x$values)
cumsum(x$lengths)[y - 1L] + 1L          # run starts
#[1]  3  6 10
x$lengths[y]                            # run lengths
#[1] 2 1 1
subunits <- lapply(y, function(i)
    seq(cumsum(x$lengths)[i - 1L], length.out=x$lengths[i] + 2L))

现在subunits是行索引列表

subunits
#[[1]]
#[1] 2 3 4 5
#
#[[2]]
#[1] 5 6 7
#
#[[3]]
#[1]  9 10 11

但要插入缺失值,这些都不是必需的,而是可以做

nas <- is.na(df$Fuel)
df$Fuel[nas] <- approx(seq(nrow(df)), df$Fuel, xout=which(nas))$y
df
#     wpt  ID     Fuel Express Local
#1      1 S36 12.00000       0     1
#2      2 S36 14.00000       1     0
#3  inter S36 14.66667       1     0
#4  inter S36 15.33333       1     0
#5      3 S36 16.00000       1     0
#6  inter S36 17.00000       0     1
#7      4 S36 18.00000       1     0
#8      5 S36 22.00000       1     0
#9      6 W09 45.00000       0     1
#10 inter W09 46.50000       1     0
#11     7 W09 48.00000       0     1