如何在某些条件下删除重复

时间:2017-07-21 05:29:54

标签: r rstudio data-science data-science-experience

这是我在不同数据集上尝试做的一个示例,但这仍然不起作用

PORT    STATUS   VESSEL         DWT      IMP/EXP    QTY (Mts)

1 KANDLA    SAILED  CAPTAIN HAMADA  7938 EXP   4500
2 KAKINADA  EXPECTED CELON BREEZE       IMP      30000
3  KAKINADA BERTH    CELON BREEZE       IMP     3000
4 KAKINADA  SAILED   CELON BREEZE       IMP     30000
5 KANDLA    ANCHORAGE CAPTAIN HAMADA    EXP  4500
6 KAKINADA  BERTH    CELON BREEZE       IMP     30000

我想将一行(PORT,VESSEL,IMP / EXP)与另一行进行比较,如果匹配则删除,如果行中的IMP / EXP是" IMP"然后按STATUS的优先顺序删除行: 航行>泊位> “锚地”预期 它将优先考虑航行=状态,其他具有锚定和删除第二行,因为它匹配数量,端口,船只与第4行。 等条件匹配后再看

  1 ) status=sailed and other have berth ,it will delete berth row
  2) sailed and other have expected,it will delete expected row
   3)if some row have berth and other have anchorage will delete anchorage
  4)if some has expected=STATUS & other row have sailed=STATUS it will delete              

    "expected"=STATUS   row        
等等 行应该匹配条件,即qty,port,vessel,根据条件删除行

对于IMP / EXP中的EXP它应该匹配条件,即数量,端口,容器
       状态优先条件:

     priority- sailed>anchorage>expected>  berth

OUTPUT应为

PORT    STATUS   VESSEL              DWT    IMP/EXP QTY (Mts)

1 KANDLA    SAILED  CAPTAIN HAMADA  7938         EXP    4500
3  KAKINADA BERTH    CELON BREEZE             IMP      3000
4 KAKINADA  SAILED   CELON BREEZE             IMP      30000

删除第2,第5,第6行是所需的输出

1 个答案:

答案 0 :(得分:1)

首先,您需要在data.frame中将数据读入R中。 data.frame test应如下所示:

>test

#      PORT    STATUS         VESSEL  DWT IMPEXP   QTY
#1   KANDLA    SAILED CAPTAIN HAMADA 7938    EXP  4500
#2 KAKINADA  EXPECTED   CELON BREEZE   NA    IMP 30000
#3 KAKINADA     BERTH   CELON BREEZE   NA    IMP  3000
#4 KAKINADA    SAILED   CELON BREEZE   NA    IMP 30000
#5   KANDLA ANCHORAGE CAPTAIN HAMADA   NA    EXP  4500
#6 KAKINADA     BERTH   CELON BREEZE   NA    IMP 30000

使用plyr包的ddply功能,您应该可以在tfollowing功能的帮助下获得所需的输出。

ddply(test,.variables = c("PORT","VESSEL","IMPEXP","QTY"),
  function(t){if(t$IMPEXP[1]=="IMP"){
    t$STATUS<-factor(x = t$STATUS,levels =c("EXPECTED","ANCHORAGE","BERTH","SAILED"),ordered = T)
    return(t[which.max(as.integer(t$STATUS)),])
  }else{
    t$STATUS<-factor(x = t$STATUS,levels =c("BERTH","EXPECTED","ANCHORAGE","SAILED"),ordered = T)
    return(t[which.max(as.integer(t$STATUS)),])}
  }
)

#PORT STATUS         VESSEL  DWT IMPEXP   QTY
#1 KAKINADA  BERTH   CELON BREEZE   NA    IMP  3000
#2 KAKINADA SAILED   CELON BREEZE   NA    IMP 30000
#3   KANDLA SAILED CAPTAIN HAMADA 7938    EXP  4500