我有一个像这样的数据框
ID <- c("G110","G110","G110","G110","G110","G160","G160","G160",
"G180","G180","G180","G180","G180","G190","G190","G190")
Measurement <- c("Length","Length","Length","Breadth","Breadth","Length","Breadth","Length",
"Length","Length","Length","Length","Length","Breadth","Breadth","Breadth")
Category <- c("A","A","A","A","A","B","B","B",
"C","C","C","C","C","C","C","C")
我通过ID,测量和类别获得计数
library(doBy)
UniqueCategory <- summaryBy(Category~ID+Measurement+Category, data = df,
FUN = function(x) { c(n = length(x)) } )
UniqueCategory
ID Measurement Category Category.n
1 G110 Breadth A 2
2 G110 Length A 3
3 G160 Breadth B 1
4 G160 Length B 2
5 G180 Length C 5
6 G190 Breadth C 3
现在,我有一个阈值,我想在这些计数上使用并在df中创建一个名为Output
的列if A > 2, then df$Output is True else False
if B > 1, then df$Output is True else False
if C > 4, then df$Output is True else False
df的期望输出就像
ID Measurement Category Output
1 G110 Length A True
2 G110 Length A True
3 G110 Length A True
4 G110 Breadth A False
5 G110 Breadth A False
6 G160 Length B True
7 G160 Breadth B False
8 G160 Length B True
9 G180 Length C True
10 G180 Length C True
11 G180 Length C True
12 G180 Length C True
13 G180 Length C True
14 G190 Breadth C False
15 G190 Breadth C False
16 G190 Breadth C False
我如何让这个工作?我试图使用if语句,但没有正确。请提供一些指示。
答案 0 :(得分:1)
这是使用dplyr
的解决方案:
library(dplyr)
df %>%
group_by(ID , Measurement , Category) %>%
mutate( Category.n = n() ) %>%
mutate( Output = ifelse( (Category == "A" & Category.n>2) | (Category == "B" & Category.n>1) | (Category == "C" & Category.n>4) , TRUE, FALSE))