df示例:
experiment = c("A", "A", "A", "A", "A", "B", "B", "B")
count = c(1,2,3,4,5,1,2,1)
df = cbind.data.frame(experiment, count)
所需的输出:
experiment_1 = c("A", "A", "A", "B", "B")
freq = c(1,1,3,2,1) # frequency
freq_per = c(20,20,60,66.6,33.3) # frequency percent
df_1 = cbind.data.frame(experiment_1, freq, freq_per)
我要执行以下操作:
我有以下代码。如何执行第4步?
freq_count = df %>% dplyr::group_by(experiment, count) %>% summarize(freq=n()) %>% na.omit() %>% mutate(freq_per=freq/sum(freq)*100)
非常感谢您。
答案 0 :(得分:1)
可能有一种更为简洁的方法,但是我建议使用mutate()
和ifelse()
将您的计数压缩到一个新列中,然后进行总结:
freq_count %>%
mutate(collapsed_count = ifelse(count >= 3, 3, count)) %>%
group_by(collapsed_count, add = TRUE) %>% # adds a 2nd grouping var
summarise(freq = sum(freq), freq_per = (sum(freq_per))) %>%
select(-collapsed_count) # dropped to match your df_1.
此外,仅供参考,如果您希望节省一些击键,那么对于步骤2,可以考虑使用count()
函数。与显式调用tibble()
的dataframe方法来创建数据帧相比,data.frame()
或cbind
可能是更好的选择。