我有一个如下所示的数据框(df):
X1 Category total.count
100279 A1 1
100279 A2 1
100279 A3 1
100279 A4 1
100280 A1 1
100280 A2 4
100281 A1 1
100281 A2 1
100282 A1 7
100283 A2 1
100283 A3 1
100283 A4 1
我想提取一个在total.count中分配了soley 1s的id号列表。
X1
100279
100281
100283
我试过了:
df2 = df[total.count == 1]
但它只返回相同的东西,只有total.count等于1的行:
X1 total.count
100279 1
100279 1
100279 1
100279 1
100280 1
100281 1
100281 1
100283 1
100283 1
100283 1
有什么想法吗?
答案 0 :(得分:1)
请尝试以下操作。
SELECT * INTO #MY_TEMP
FROM
(
SELECT TOP 40 *
FROM SOME_TABLE
ORDER BY RECORD_DATE DESC
)
答案 1 :(得分:1)
对于data.table
方法,您可以这样做:
library(data.table)
setDT(df)[, which(all(total.count==1)), by=X1]
X1 V1
1: 100279 1
2: 100281 1
3: 100283 1
此外,如果df$total.count
没有0,您还可以使用:
setDT(df)[, which(sum(total.count)==length(total.count)), by=X1]
答案 2 :(得分:1)
使用ave
的基准R方法,用于查找all
值为==1
的组:
unique(dat[ave(dat$total.count==1, dat$X1, FUN=all),"X1"])
#[1] 100279 100281 100283
答案 3 :(得分:0)
可读选项,如果您可以使用包
library(dplyr)
df %>%
group_by(X1) %>%
summarize(wanted = all(total.count == 1)) %>%
filter(wanted) %>%
select(X1) %>%
c()
$X1
[1] 100279 100281 100283
如果您更喜欢使用基数R,这里有一种可能性:
unwanted <- as.integer(gsub(',.*', '', grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE)))
unwanted
[1] 100280 100282
# Wanted IDs
unique( df$X1[! df$X1 %in% unwanted] )
[1] 100279 100281 100283
将一个班轮打包成合乎逻辑的步骤:
# Condition for rows with the correct number
df$total.count == 1
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE
# Combinations of ID + condition for each row
unique(paste(df$X1, df$total.count == 1, sep = ","))
[1] "100279,TRUE" "100280,TRUE" "100280,FALSE" "100281,TRUE" "100282,FALSE" "100283,TRUE"
# Failing combinations
grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE)
[1] "100280,FALSE" "100282,FALSE"
# ID numbers associated with failing combinations
gsub(',.*', '', grep('FALSE', unique(paste(df$X1, df$total.count == 1, sep = ",")), value = TRUE))
[1] "100280" "100282"