如何在R中选择精确匹配的行

时间:2016-06-01 10:45:19

标签: r

我有以下表格 -

Id      version     .net version
12886033    1       v2.0.50727
12886033    2       v3.0
12886033    3       v3.5
12886033    4       v4.0
12887578    1       v2.0.50727
12887578    2       v3.0
12887578    3       v3.5
12887578    4       v4.0
12888639    4       v4.0
12888676    4       v4.0

我想通过提供我想要的版本号来选择安装了一个.net版本的记录。所以,如果我给.net版本" v4.0"它应该给我12888639& 12888676 但不是12886033& 12887578 因为它上面安装了所有版本。如何在R脚本中实现这一点?

5 个答案:

答案 0 :(得分:2)

使用dplyr:

library(dplyr)

# define current net
myCurrentNet <- "v4.0"

# Group by Id, filter if the group by count is 1 AND net_version matches current net
df1 %>% 
  group_by(Id) %>% 
  filter(n() == 1 & net_version == myCurrentNet)

# output
#         Id version net_version
#      (int)   (int)      (fctr)
# 1 12888639       4        v4.0
# 2 12888676       4        v4.0

# dummy data
df1 <- read.table(text = "Id      version     net_version
12886033    1       v2.0.50727
12886033    2       v3.0
12886033    3       v3.5
12886033    4       v4.0
12887578    1       v2.0.50727
12887578    2       v3.0
12887578    3       v3.5
12887578    4       v4.0
12888639    4       v4.0
12888676    4       v4.0", header = TRUE)

答案 1 :(得分:1)

以下是使用data.table

的选项
library(data.table)
setDT(df1)[df1[, .I[.N==1 & net_version ==myCurrentNet], Id]$V1]
#         Id version net_version
#1: 12888639       4        v4.0
#2: 12888676       4        v4.0

,其中

myCurrentNet <- "v4.0"

答案 2 :(得分:0)

会像

tmp <- data.frame(Id = yourTable$Id, cnt = rep(1,nrow(yourTable)))
tmp <- aggregate(x = tmp$cnt, by=list(tmp$Id), FUN=sum)
yourTable$numberOfVersions <- rep(NA,nrow(yourTable))
yourTable$numberOfVersions <- tmp$x[match(yourTable$Id,tmp$Group.1)]

res <- yourTable$Id[which(yourTable[,".net version"] == "v4.0" & yourTable$numberOfVersions == 1]

为你工作?

答案 3 :(得分:0)

我认为,您正在寻找以下输出:

df1&lt; - read.table(header = T,text =“Id version net.version 12886033 1 v2.0.50727 12886033 2 v3.0 12886033 3 v3.5 12886033 4 v4.0 12887578 1 v2.0.50727 12887578 2 v3.0 12887578 3 v3.5 12887578 4 v4.0 12888639 4 v4.0 12888676 4 v4.0“)

  

y&lt; -aggregate(df1 $ version,by = list(df1 $ Id),FUN = sum)

     

z&lt; -y [y $ x!= 10,]

     

Z $ Group.1

输出:

[1] 12888639 12888676

答案 4 :(得分:0)

以下是使用ave的另一个基础R答案:

# count the number of versions for each ID
df$versCnt <- ave(df$version, df$Id, FUN=length)

# return the IDs that only have version 4
df[df$versCnt == 1 & df$net.version==4,"Id"])

或者您可以使用with功能:

with(df, df[versCnt == 1 & net.version==4,"Id"])

数据

df <- read.table(header=T, text="Id    version   net.version
12886033    1   v2.0.50727
12886033    2   v3.0
12886033    3   v3.5
12886033    4   v4.0
12887578    1   v2.0.50727
12887578    2   v3.0
12887578    3   v3.5
12887578    4   v4.0
12888639    4   v4.0
12888676    4   v4.0")