我有一个这样的数据框,有超过100列:
ID regulation press treat
1001 test1 0.2 b
1001 test1 1 c
1002 test2 2 s
1002 test2 3 s
1004 test1 4 s
1004 test1 5 f
1005 test2 6 w
1006 test2 6 u
1006 test2 1 h
每个ID只有一个法规,数据库中只有两个可能的法规(test1和test2)
我基本上想要 SUM 所有ID的唯一出现
预期输出
test1: 2
test2: 3
这意味着,test1发生在2个唯一ID中,test2发生在3个唯一ID中。
答案 0 :(得分:0)
尝试
rowSums(table(unique(df[, c("regulation", "ID")])))
# test1 test2
# 2 3
或者
table(unique(df[,c('regulation', 'ID')])[,"regulation"])
# test1 test2
# 2 3
或使用dplyr
library(dplyr)
count(unique(select(df, ID, regulation)), regulation)
#using the %>%, the above code would be
#df %>%
# select(ID, regulation)
# unique()
# count(regulation)
# regulation n
#1 test1 2
#2 test2 3
df <- structure(list(ID = c(1001L, 1001L, 1002L, 1002L, 1004L, 1004L,
1005L, 1006L, 1006L), regulation = c("test1", "test1", "test2",
"test2", "test1", "test1", "test2", "test2", "test2"), press = c(0.2,
1, 2, 3, 4, 5, 6, 6, 1), treat = c("b", "c", "s", "s", "s", "f",
"w", "u", "h")), .Names = c("ID", "regulation", "press", "treat"
), class = "data.frame", row.names = c(NA, -9L))
答案 1 :(得分:0)
除了Akrun的答案,这是非常好的;您可以使用data.table
方法轻松完成此操作。
library(data.table)
dt <- data.table(df)
rowSums(table(unique(dt[,ID, regulation])))
#test1 test2
#2 3