我有一个由三个分类变量组成的数据框,我想找到每个组合的频率,并按频率按降序对结果进行排序,如下所示:
我的数据:
A LEVEL1 PASS
A LEVEL1 FAIL
B LEVEL2 PASS
A LEVEL1 PASS
B LEVEL2 PASS
A LEVEL1 PASS
结果如下:
A LEVEL1 PASS 3
B LEVEL2 PASS 2
A LEVEL1 FAIL 1
我使用plyr库,
myfreq<-count(myresult,vars = NULL, wt_var = NULL)
myfreq<-myfreq[order-myfreq$freq,]
在开始时它可以工作,但它只是给我这个错误:
Error in grouped_df_impl(data, unname(vars), drop) :
Column `vars` is unknown
我使用的其他图书馆是rJava和dplyr
感谢
答案 0 :(得分:1)
我建议使用dplyr
包含的tidyverse
。
我不知道数据框中列的名称是什么,所以我在下面的示例中将它们命名为col1
,col2
和col3
。
library(tidyverse)
df <- tribble(
~ col1, ~col2, ~col3,
"A", "LEVEL1", "PASS",
"A", "LEVEL1", "FAIL",
"A", "LEVEL1", "PASS",
"B", "LEVEL2", "PASS",
"A", "LEVEL1", "PASS")
# here is where the magic happens
df %>% count(col1, col2, col3, sort = TRUE)
答案 1 :(得分:1)
您可以在dplyr中使用group_by:
library(dplyr)
x <- data.frame(letter = c("A", "A", "B", "A", "B", "A"), level = c("LEVEL 1", "LEVEL 1", "LEVEL 2", "LEVEL 1", "LEVEL 2", "LEVEL 1"), text = c("PASS", "FAIL", "PASS", "PASS", "PASS", "PASS"))
df <- x %>%
group_by_all() %>%
count()
或者你可以这样做:
df <- x %>%
group_by(letter, level, text) %>%
count()
输出:
> df <- x %>% group_by_all() %>% count()
> df
# A tibble: 3 x 4
# Groups: x, y, z [3]
x y z n
<fctr> <fctr> <fctr> <int>
1 A LEVEL 1 FAIL 1
2 A LEVEL 1 PASS 3
3 B LEVEL 2 PASS 2
答案 2 :(得分:0)
您可以使用table
功能。
ex <- data.frame("letter" = c("A", "A", "B", "A", "B", "A"),
"level" = c("LEVEL1", "LEVEL1", "LEVEL2", "LEVEL1", "LEVEL2", "LEVEL1"),
"test" = c("PASS", "FAIL", rep("PASS", 4)))
ex
res <- data.frame(table(ex$level, ex$test))
colnames(res) <- c("level", "test", "freq")
您可以稍后将结果data.frame与原始数据合并。
答案 3 :(得分:0)
这是与n()
的整齐df <- tibble(
id = c("A", "A", "B", "A", "B", "A"),
level = c("LEVEL1", "LEVEL1", "LEVEL2", "LEVEL1", "LEVEL2", "LEVEL1"),
type = factor(c("PASS", "FAIL", "PASS", "PASS", "PASS", "PASS"))
)
df %>%
group_by(id, level, type) %>%
summarise(n = n()) %>%
arrange(desc(n))
# A tibble: 3 x 4
# Groups: id, level [?]
id level type n
<chr> <chr> <fctr> <int>
1 A LEVEL1 FAIL 1
2 A LEVEL1 PASS 3
3 B LEVEL2 PASS 2