split-apply-combine:在R

时间:2017-12-19 23:53:23

标签: r sorting dplyr plyr

我试图根据两类数据之间的连续变量对数据帧进行排序。具体来说,我想对连续变量进行排序(降序),但同时保留相似类型的变量。这是一个例子:

    pets <- data.frame(animal = c("dog", "dog", "dog", "cat", "cat", "fish", "fish", "fish"),
       breed = c("retriever", "husky", "husky", "grey", "white", "guppy", "betta", "betta"),
       count = c(4, 3, 7, 8, 9, 2, 12, 1))

现在,数据框未排序。我想对其进行排序,以便具有最高平均值breeds的{​​{1}}首先显示,但所有count s(和breed s)保持组合在一起。如果我根据animal订购相框,我会失去breed的正确顺序,反之亦然。即使我尝试这两样:

count

输出未正确排序。我经历了一些split-apply-combine tutorials,但我只能找到那些试图将一组数据保存在一起的数据,而不是我的数据中的两个数据。

现在,这是我得到的最好的:

    pets[with(pets, order(breed, -count)), ]

返回

    split_pets <- split(pets, pets$animal)
    unlist(lapply(split_pets, function(x) sort(with(x, tapply(count, breed, mean)), decreasing = TRUE)))

当然,我已经得到了正确的订单。但我实际上并不关心这些方法,我只需要根据原始数据框进行排序这个。接下来是基于品种再次分裂这个兔子洞,但后来我根据列表列表的数据框列进行排序。这听起来太复杂了。我还尝试cat.white cat.grey dog.husky dog.retriever fish.betta fish.guppy 9.0 8.0 5.0 4.0 6.5 2.0 计算,然后将其从order传到group_by(),但这并没有让我比我更远现在

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

您可以先对组进行排序,然后按照预期的顺序将其重新连接到原始组。

pets <- data.frame(
  animal = c("dog", "dog", "dog", "cat", "cat", "fish", "fish", "fish"),
  breed = c("retriever", "husky", "husky", "grey", "white", "guppy", "betta", "betta"),
  count = c(4, 3, 7, 8, 9, 2, 12, 1),
  stringsAsFactors = FALSE
)

library(dplyr)

pets %>%
  group_by(animal, breed) %>%
  summarise(avg = mean(count)) %>%
  right_join(pets, by = c("animal", "breed")) %>%
  arrange(animal, desc(avg), desc(count)) %>%
  select(-avg) %>%
  ungroup

# # A tibble: 8 x 3
#   animal     breed count
#    <chr>     <chr> <dbl>
# 1    cat     white     9
# 2    cat      grey     8
# 3    dog     husky     7
# 4    dog     husky     3
# 5    dog retriever     4
# 6   fish     betta    12
# 7   fish     betta     1
# 8   fish     guppy     2