Question

我必须根据genus列中值的频率对给定数据集进行子集化。

msleep %>%
  group_by(genus) %>%
  count(genus) %>% 
  count(n)

# # A tibble: 3 x 2
#       n    nn
#   <int> <int>
# 1     1    73
# 2     2     2
# 3     3     2

在查看输出后，我对频率为2的值感兴趣。我使用以下内容获得：

msleep %>%
  group_by(genus) %>%
  filter(n() ==2)

# Source: local data frame [4 x 11]
# Groups: genus [2]
# 
# # A tibble: 4 x 11
#         name  genus  vore          order conservation sleep_total sleep_rem sleep_cycle awake brainwt bodywt
#        <chr>  <chr> <chr>          <chr>        <chr>       <dbl>     <dbl>       <dbl> <dbl>   <dbl>  <dbl>
# 1      Horse  Equus herbi Perissodactyla domesticated         2.9       0.6        1.00  21.1  0.6550 521.00
# 2     Donkey  Equus herbi Perissodactyla domesticated         3.1       0.4          NA  20.9  0.4190 187.00
# 3 Arctic fox Vulpes carni      Carnivora         <NA>        12.5        NA          NA  11.5  0.0445   3.38
# 4    Red fox Vulpes carni      Carnivora         <NA>         9.8       2.4        0.35  14.2  0.0504   4.23

但是，当实际值（msleep，Equus）未知时，我希望在主数据集Vulpes中找到这些行的索引。怎么做到这一点？

我实现了使用此解决方法。这是正确的方法还是任何其他有效的方法？

msleep %>%
  rowid_to_column() %>% 
  group_by(genus) %>%
  filter(n() ==2) %>% 
  ungroup() %>% 
  select(rowid)

# # A tibble: 4 x 1
#   rowid
#   <int>
# 1    23
# 2    24
# 3    82
# 4    83

识别特定于数据集列的选定值的索引

0 个答案: