让我们考虑以下数据框:
set.seed(123)
data <- data.frame(col1 = factor(rep(c("A", "B", "C"), 4)),
col2 = factor(c(rep(c("A", "B", "C"), 3), c("A", "A", "A"))),
val1 = 1:12,
val2 = rnorm(12, 10, 15))
列联表如下:
cont_tab <- table(data$col1, data$col2, dnn = c("col1", "col2"))
cont_tab
col2
col1 A B C
A 4 0 0
B 1 3 0
C 1 0 3
如您所见,未发生某些配对:(A,B),(A,C),(B,C),(C,B)。我分析的最终目标是列出所有对(在本例中为9)并显示每个对的统计信息。使用dplyr::group_by()
函数时,我遇到了一个限制。即,dplyr::group_by()
仅考虑现有的配对(至少出现一次的配对):
data %>%
group_by(col1, col2) %>%
summarize(stat = sum(val2) - sum(val1))
# A tibble: 5 x 3
# Groups: col1 [?]
col1 col2 stat
<fct> <fct> <dbl>
1 A A 58.1
2 B A -16.4
3 B B 17.0
4 C A -12.9
5 C C -41.9
我想到的输出有9行(其中4行的stat
等于0)。它可以在dplyr
中使用吗?
编辑:抱歉一开始太含糊。真正的问题比计算特定对出现的次数更为复杂。我添加了新数据,以使实际问题更加明显。
答案 0 :(得分:4)
从<?php
include('/path/to/key.php'); //here you are defining $foo
$bar = $foo;
//now you can continue with the rest of your original script
$email = $_POST('email');
if (in_array($email, $bar)) {
echo('in array');
} else {
echo('not in array');
}
?>
添加spread
到获得与tidyr
相同的结果要容易得多
table
注意:library(dplyr)
library(tidyr)
count(data, col1, col2) %>%
spread(col2, n, fill = 0)
# A tibble: 3 x 4
# Groups: col1 [3]
# col1 A B C
# <fct> <dbl> <dbl> <dbl>
#1 A 4 0 0
#2 B 1 3 0
#3 C 1 0 3
步骤在此处更改为group_by/summarise
如@divibisan所建议,如果OP需要长格式,则在末尾添加count
gather
OP帖子中有更新的数据
data %>%
group_by(col1, col2) %>%
summarize(stat = n()) %>%
spread(col2, stat, fill = 0) %>%
gather(col2, stat, A:C)
# A tibble: 9 x 3
# Groups: col1 [3]
# col1 col2 stat
# <fct> <chr> <dbl>
#1 A A 4
#2 B A 1
#3 C A 1
#4 A B 0
#5 B B 3
#6 C B 0
#7 A C 0
#8 B C 0
#9 C C 3
答案 1 :(得分:3)
即使没有dplyr
as.data.frame(table(data$col1, data$col2, dnn = c("col1", "col2")))
# col1 col2 Freq
#1 A A 4
#2 B A 1
#3 C A 1
#4 A B 0
#5 B B 3
#6 C B 0
#7 A C 0
#8 B C 0
#9 C C 3
答案 2 :(得分:2)
您可以使用__block
tidyr::complete
您也可以在第一部分中使用library(tidyverse)
data %>%
group_by(col1, col2) %>%
summarize(stat = n()) %>%
# additions below
ungroup %>%
complete(col1, col2, fill = list(stat = 0))
# # A tibble: 9 x 3
# col1 col2 stat
# <chr> <chr> <dbl>
# 1 A A 4
# 2 A B 0
# 3 A C 0
# 4 B A 1
# 5 B B 3
# 6 B C 0
# 7 C A 1
# 8 C B 0
# 9 C C 3
。下面的代码提供与上面的代码相同的输出
count
答案 3 :(得分:1)
还有tidyverse
使用tidyr::complete()
的可能性:
data %>%
group_by_all() %>%
add_count() %>%
complete(col1, col2, fill = list(n = 0)) %>%
distinct()
col1 col2 n
<fct> <fct> <dbl>
1 A A 4
2 A B 0
3 A C 0
4 B A 1
5 B B 3
6 B C 0
7 C A 1
8 C B 0
9 C C 3
或使用tidyr::expand()
:
data %>%
count(col1, col2) %>%
right_join(data %>%
expand(col1, col2), by = c("col1" = "col1",
"col2" = "col2")) %>%
replace_na(list(n = 0))
或使用tidyr::crossing()
:
data %>%
count(col1, col2) %>%
right_join(crossing(col1 = unique(data$col1),
col2 = unique(data$col2)), by = c("col1" = "col1",
"col2" = "col2")) %>%
replace_na(list(n = 0))
答案 4 :(得分:0)
这里有一个解决方法,希望它对您有用。将表格与所有组合的表格合并,然后将NA替换为0。
data %>%
group_by(col1, col2) %>%
summarize(stat = n()) %>%
merge(unique(expand.grid(data)), by=c("col1","col2"), all=T) %>%
replace_na(list(stat=0))