dplyr / tidyverse函数中的条件语句,用于排除因子的相同级别之间的比较

时间:2017-06-28 17:11:32

标签: r tidyverse

我有一个像这样的数据框:

 data = read.table(text = "region     plot    species
 1          1A      A_B  
 1          1A      A_B
 1          1B      B_C
 1          1C      A_B
 1          1D      C_D
 2          2A      B_C
 2          2A      B_C
 2          2A      E_F
 2          2B      B_C
 2          2B      E_F     
 2          2C      E_F
 2          2D      B_C
 3          3A      A_B
 3          3B      A_B", stringsAsFactors = FALSE, header = TRUE)

我想比较plot的每个级别,以获得两个地图比较中唯一species个匹配的计数。但是,我不希望在相同的图中进行比较(即移除/不包括1A_1A或1B_1B或2C_2C等)。此示例的输出应如下所示:

output<-
  region  plot   freq
  1     1A_1B     0     
  1     1A_1C     1
  1     1A_1D     0
  1     1B_1C     0    
  1     1B_1D     0 
  1     1C_1D     0
  2     2A_2B     2     
  2     2A_2C     1
  2     2A_2D     1
  2     2B_2C     1    
  2     2B_2D     1 
  2     2C_2D     0
  3     3A_3B     1  

我已经从@HubertL调整了以下代码,Convert list of matrices to a single data frame 但很难纳入适当的if else语句来满足这个条件:

library(tidyverse)

data %>% group_by(region, species) %>% 
    filter(n() > 1) %>%
    summarize(y = list(combn(plot, 2, paste, collapse="_"))) %>% 
    unnest %>%
    group_by(region, y) %>% 
    summarize(ifelse(plot[i] = plot[i], freq = 
    length(unique((species),)

1 个答案:

答案 0 :(得分:0)

您可以通过添加filter(!duplicated(plot))

来过滤掉重复项
data %>% group_by(region, species) %>% 
  filter(!duplicated(plot)) %>%
  filter(n() > 1) %>%
  summarize(y = list(combn(plot, 2, paste, collapse="_"))) %>% 
  unnest %>%
  group_by(region, y)  %>% 
  summarize(freq=n())

  region     y  freq
   <int> <chr> <int>
1      1 1A_1C     1
2      2 2A_2B     2
3      2 2A_2C     1
4      2 2A_2D     1
5      2 2B_2C     1
6      2 2B_2D     1
7      3 3A_3B     1