Question

这是shor示例数据。原始数据有很多列和行。

head（df，15）

    ID   col1   col2
1   1  green yellow
2   1  green   blue
3   1  green  green
4   2 yellow   blue
5   2 yellow yellow
6   2 yellow   blue
7   3 yellow yellow
8   3 yellow yellow
9   3 yellow   blue
10  4   blue yellow
11  4   blue yellow
12  4   blue yellow
13  5 yellow yellow
14  5 yellow   blue
15  5 yellow yellow

我想计算col2中有多少种不同的颜色，包括col1的颜色。例如：对于ID = 4，col2中只有一种颜色。如果我们包含col1，则有2种不同的颜色。因此输出应为2，依此类推。

我以这种方式尝试过，但是它没有提供我想要的输出：ID = 4变成0，这不是我想要的。那么我怎样才能告诉R来计算col1中的颜色呢？

out <- df %>%
  group_by(ID) %>%
  mutate(N = ifelse(col1 != col2, 1, 0))

我想要的输出是这样的：

ID  col1    count
1   green   3
2   yellow  2
3   yellow  2
4   blue    2
5   yellow  2

Answer 1

您可以这样做：

df %>%
 group_by(ID, col1) %>%
 summarise(count = n_distinct(col2))

     ID col1   count
  <int> <chr>  <int>
1     1 green      3
2     2 yellow     2
3     3 yellow     2
4     4 blue       1
5     5 yellow     2

甚至：

df %>%
 group_by(ID, col1) %>%
 summarise_all(n_distinct)

     ID col1    col2
  <int> <chr>  <int>
1     1 green      3
2     2 yellow     2
3     3 yellow     2
4     4 blue       1
5     5 yellow     2

按每三行分组：

df %>%
 group_by(group = gl(n()/3, 3), col1) %>%
 summarise(count = n_distinct(col2))

根据R中2个变量的条件按组对观察值进行计数

1 个答案: