根据列阈值将行拆分为多行

时间:2017-08-10 18:01:22

标签: r split dplyr

我有一个如下数据集:

dat2 <- read.table(header=TRUE, text="
ID  LT  LF  Dist  TM  NM
a001    0   3   3 p001    pn07
                   a003 4.1 4.9 0.8 p002    pn08
                   a004 2.1 6.6 4.5 p003    pn09
                   a008 8.7 12    3.3 p004    pn10
                   a009 2   4.3 2.3 p005    pn11
                   a010 1   4   3 p006    pn12
                   a023 1   3   2 p007    pn13
                   a024 1   3   2 p008    pn14
                   ")

dat2
    ID  LT   LF Dist   TM   NM
1 a001 0.0  3.0  3.0 p001 pn07
2 a003 4.1  4.9  0.8 p002 pn08
3 a004 2.1  6.6  4.5 p003 pn09
4 a008 8.7 12.0  3.3 p004 pn10
5 a009 2.0  4.3  2.3 p005 pn11
6 a010 1.0  4.0  3.0 p006 pn12
7 a023 1.0  3.0  2.0 p007 pn13
8 a024 1.0  3.0  2.0 p008 pn14

我想根据Dist阈值拆分行。如果Dist >=2.0它会将行分成几行。还需要相应更改LTLF。例如,考虑a002。它的Dist为4.5。它以LT=2.1开头,以LF=6.6结尾。对于拆分行,LTLF将如下所示:

a002: LT=2.1 LF=4.1 (calculation= 2.1+`Dist=2`) Dist=2
a002: LT=4.1 LF=6.1 (calculation= 4.1+`Dist=2`) Dist=2
a002: LT=6.1 LF=6.6 (endpoint,LF is `6.6` and `Dist < 2`) Dist=0.5

最终输出将是:

   ID   LT   LF Dist   TM   NM
1  a001  0.0  2.0  2.0 p001 pn07
2  a001  2.0  3.0  1.0 p001 pn07
3  a003  4.1  4.9  0.8 p002 pn08
4  a004  2.1  4.1  2.0 p003 pn09
5  a004  4.1  6.1  2.0 p003 pn09
6  a004  6.1  6.6  0.5 p003 pn09
7  a008  8.7 10.7  2.0 p004 pn10
8  a008 10.7 12.0  1.3 p004 pn10
9  a009  2.0  4.0  2.0 p005 pn11
10 a009  4.0  4.3  0.3 p005 pn11
11 a010  1.0  3.0  2.0 p006 pn12
12 a010  3.0  4.0  1.0 p006 pn12
13 a023  1.0  3.0  2.0 p007 pn13
14 a024  1.0  3.0  2.0 p008 pn14

similar question就在这里。但不完全一样。

1 个答案:

答案 0 :(得分:1)

我确信这是有史以来最不优雅的解决方案之一。但它有效

require(dplyr)
Splitter <- function(rowid){
  TMP <- dat2 %>% filter(ID==rowid) 
  if(TMP$Dist>2){
    TMP1 <- c(rep(x = 2,times=trunc(TMP$Dist/2,0)), TMP$Dist%%2)
    TMP1 <- TMP1[TMP1!=0]
    TMP2 <- data.frame(ID=TMP$ID,LT=TMP$LT,LF=TMP$LF,Dist=TMP1,TM=TMP$TM,NM=TMP$NM)
    for(i in nrow(TMP2):2){
      TMP2[i,2] <- TMP2[i,3] - TMP2[i,4]
      TMP2[i-1,3] <- TMP2[i,2]
    }
  }
  else {TMP2 <- TMP}
  return(TMP2)
  }
FinalDF <- bind_rows(lapply(unique(dat2$ID),Splitter))

> FinalDF
     ID   LT   LF Dist   TM   NM
1  a001  0.0  2.0  2.0 p001 pn07
2  a001  2.0  3.0  1.0 p001 pn07
3  a003  4.1  4.9  0.8 p002 pn08
4  a004  2.1  4.1  2.0 p003 pn09
5  a004  4.1  6.1  2.0 p003 pn09
6  a004  6.1  6.6  0.5 p003 pn09
7  a008  8.7 10.7  2.0 p004 pn10
8  a008 10.7 12.0  1.3 p004 pn10
9  a009  2.0  4.0  2.0 p005 pn11
10 a009  4.0  4.3  0.3 p005 pn11
11 a010  1.0  3.0  2.0 p006 pn12
12 a010  3.0  4.0  1.0 p006 pn12
13 a023  1.0  3.0  2.0 p007 pn13
14 a024  1.0  3.0  2.0 p008 pn14