R

时间:2017-10-05 16:42:31

标签: r dataframe subset extract

我有多个数据框,如下所示: 列中有许多物种,我在这里没有报道。 D1:

Year   Region  Sites Depth Transect Pharia pyramidatus
2000   LP     BALLENA      5        1        0.03
2000   LP     ISLOTES      5        1        0.20
2000   LP     NORTE        5        1        0.10
2000   LP     NORTE       20        1        0.00

D2

Year   Region  Sites      Depth Transect Pharia pyramidatus
2010   LP     PLAYA        5        1        0.03
2010   LP     ISLOTES      5        1        0.20
2010   LP     NORTE        5        1        0.10
2010   LP     NORTE       20        1        0.00

D3

Year   Region  Sites      Depth Transect Pharia pyramidatus
2016   LP     BALLENA      5        1        0.03
2016   LP     ISLOTES      5        1        0.20
2016   LP     SUR          5        1        0.10
2016   LP     NORTE       20        1        0.00

我想要做的是提取仅在每个年中出现的同一网站(Reef),并将结果转换为一个应如下所示的数据框:

Year   Region  Reef      Depth Transect Pharia pyramidatus
2000   LP     ISLOTES      5        1        0.20
2000   LP     NORTE        5        1        0.10
2000   LP     NORTE       20        1        0.00
2010   LP     ISLOTES      5        1        0.20
2010   LP     NORTE        5        1        0.10
2010   LP     NORTE       20        1        0.00
2016   LP     ISLOTES      5        1        0.20
2016   LP     NORTE        20       1        0.00

非常感谢你的帮助

1 个答案:

答案 0 :(得分:1)

dplyr的解决方案:

library(dplyr)
rbind(df1, df2, df3) %>%
  group_by(Reef) %>%
  filter(n_distinct(Year) == 3)

<强>结果:

# A tibble: 8 x 6
# Groups:   Reef [2]
   Year Region    Reef Depth Transect Pharia_pyramidatus
  <int> <fctr>  <fctr> <int>    <int>              <dbl>
1  2000     LP ISLOTES     5        1                0.2
2  2000     LP   NORTE     5        1                0.1
3  2000     LP   NORTE    20        1                0.0
4  2010     LP ISLOTES     5        1                0.2
5  2010     LP   NORTE     5        1                0.1
6  2010     LP   NORTE    20        1                0.0
7  2016     LP ISLOTES     5        1                0.2
8  2016     LP   NORTE    20        1                0.0

备注:

n_distinct计算每个Year的不同Reef的数量(因为我group_by(Reef))。我想要distinct_n == 3,因为我只希望返回Reef每个Year都有记录的行,在这种情况下为3年。在更一般的情况下,如果还有更多Year,您可能需要先查找数据框的Year范围,然后根据该范围查找filter,例如以下内容:

rbind(df1, df2, df3) %>%
  mutate(Year_distinct = n_distinct(Year)) %>%
  group_by(Reef) %>%
  filter(n_distinct(Year) == Year_distinct) %>%
  select(-Year_distinct)

数据:

df1 = read.table(text = "Year   Region  Reef      Depth Transect Pharia_pyramidatus
                 2000   LP     BALLENA      5        1        0.03
                 2000   LP     ISLOTES      5        1        0.20
                 2000   LP     NORTE        5        1        0.10
                 2000   LP     NORTE       20        1        0.00", header = TRUE)

df2 = read.table(text = "Year   Region  Reef      Depth Transect Pharia_pyramidatus
                 2010   LP     PLAYA        5        1        0.03
                 2010   LP     ISLOTES      5        1        0.20
                 2010   LP     NORTE        5        1        0.10
                 2010   LP     NORTE       20        1        0.00", header = TRUE)

df3 = read.table(text = "Year   Region  Reef      Depth Transect Pharia_pyramidatus
                 2016   LP     BALLENA      5        1        0.03
                 2016   LP     ISLOTES      5        1        0.20
                 2016   LP     SUR          5        1        0.10
                 2016   LP     NORTE         20        1        0.00", header = TRUE)