在比较多个条件后,R从重复行中选择一行

时间:2015-09-17 15:57:20

标签: r

我从大量数据中获得了这些重复记录。现在,我需要从这些重复的行中选择一行。

ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")
data <- data.frame(ID,date,type,level)

数据框如下所示: enter image description here

我想比较一下:对于每个ID,如果它们的日期不同,那么将它们全部保存在df.right中;如果日期相同,则比较类型,按照LC&gt; MC&gt; YA&gt; ST的顺序选择它们(例如,选择MC超过YA),将它们放入df.right;如果类型相同,则比较级别,按活动&gt; new&gt; firsttime的顺序选择它们(例如,在第一次选择new),然后将选择放入df.right。

我尝试使用foreach,这只是第一步,并且它不适用于ID有3个重复的行。

foreach (i=unique(data$ID), .combine='rbind') %do% {data[data$ID==i, "date"][1] == data[data$ID==i, "date"][2])
b <- data[data$ID==i,]}

结果应该是这样的: enter image description here 有人知道怎么做吗?真的很感激。谢谢

2 个答案:

答案 0 :(得分:2)

let filename = "myimage.wai" let documentsPath = NSSearchPathForDirectoriesInDomains(.DocumentDirectory, .UserDomainMask, true)[0] let destinationPath = documentsPath + "/" + filename 包适用于此类事情

使用因子,您可以指定您希望如何订购类别。然后,您可以为每个唯一ID /日期对选择每种类型和级别中的第一个。

>>> parser = argparse.ArgumentParser()
>>> group = parser.add_mutually_exclusive_group(required=True)
>>> group.add_argument('--check', action='store_true', dest="check")
>>> group.add_argument('--nocheck', action='store_false', dest="check")
>>> parser.parse_args(["--check"])
Namespace(check=True)
>>> parser.parse_args(["--nocheck"])
Namespace(check=False)

答案 1 :(得分:1)

此处的诀窍是根据需要订购typelevel的级别。然后需要两次重复数据删除:首先,根据列ID, date, type删除重复行,然后根据前两列删除重复行:

type = factor(type, levels=c("ST","YA","MC","LC"))
level = factor(level, levels=c("active","new","firsttime"))
data <- data.frame(ID,date,type,level)

d = with(data, data[order(ID, date, type, level),])
e = d[-which(duplicated(d[,1:3])),]
df.right = e[-which(duplicated(e[,1:2])),]
df.right = df.right[order(as.numeric(as.character(df.right$ID)), df.right$date),]
df.right

输出:

       ID       date type     level
1    6820 2014-06-12   ST firsttime
2    6820 2015-06-11   ST       new
4   17413 2014-05-01   MC    active
5   38553 2014-06-12   LC    active
6   38553 2015-06-11   LC    active
8   52760 2014-10-24   YA       new
9  717841 2014-05-01   YA    active
11 717841 2014-12-02   MC    active
12 747187 2014-03-01   LC       new
14 747187 2014-05-12   MC    active