如何使用条件对行进行分组和提取

时间:2017-07-24 20:56:54

标签: r

我有一个如下所示的数据框(df):

 X1        Category  total.count 
100279         A1        1 
100279         A2        1
100279         A3        1
100279         A4        1
100280         A1        1
100280         A2        4
100281         A1        1
100281         A2        1
100282         A1        7
100283         A2        1
100283         A3        1
100283         A4        1

我想提取一个在total.count中分配了soley 1s的id号列表。

 X1
 100279
 100281
 100283

我试过了:

df2 = df[total.count == 1]

但它只返回相同的东西,只有total.count等于1的行:

  X1      total.count 
100279    1 
100279    1
100279    1
100279    1
100280    1
100281    1
100281    1
100283    1
100283    1
100283    1

有什么想法吗?

4 个答案:

答案 0 :(得分:1)

请尝试以下操作。

SELECT * INTO #MY_TEMP
FROM
  (
    SELECT TOP 40 *
    FROM SOME_TABLE
    ORDER BY RECORD_DATE DESC
  )

答案 1 :(得分:1)

对于data.table方法,您可以这样做:

library(data.table)

setDT(df)[, which(all(total.count==1)), by=X1]

       X1 V1
1: 100279  1
2: 100281  1
3: 100283  1

此外,如果df$total.count没有0,您还可以使用:

setDT(df)[, which(sum(total.count)==length(total.count)), by=X1]

答案 2 :(得分:1)

使用ave的基准R方法,用于查找all值为==1的组:

unique(dat[ave(dat$total.count==1, dat$X1, FUN=all),"X1"])
#[1] 100279 100281 100283

答案 3 :(得分:0)

使用Dplyr

可读选项,如果您可以使用包

library(dplyr)

df %>%
    group_by(X1) %>%
    summarize(wanted = all(total.count == 1)) %>%
    filter(wanted) %>%
    select(X1) %>%
    c()

$X1
[1] 100279 100281 100283

或者,Base R

如果您更喜欢使用基数R,这里有一种可能性:

unwanted <- as.integer(gsub(',.*', '', grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE)))

unwanted
[1] 100280 100282

# Wanted IDs
unique( df$X1[! df$X1 %in% unwanted] )
[1] 100279 100281 100283

将一个班轮打包成合乎逻辑的步骤:

# Condition for rows with the correct number
df$total.count == 1
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

# Combinations of ID + condition for each row
unique(paste(df$X1, df$total.count == 1, sep = ","))
[1] "100279,TRUE"  "100280,TRUE"  "100280,FALSE" "100281,TRUE"  "100282,FALSE" "100283,TRUE" 

# Failing combinations
grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE)
[1] "100280,FALSE" "100282,FALSE"

# ID numbers associated with failing combinations
gsub(',.*', '', grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE))
[1] "100280" "100282"