使用mutate()访问S3类似列表的对象中的值(或从S3对象中提取值)

时间:2017-05-20 21:41:03

标签: r dplyr

我希望能够将操作应用于包含类似S3列表的对象的数据框(tibble)列,以对列中每个对象的一个​​命名项进行操作。根据问题的底部,我在sapply()中使用mutate()工作,但这看起来应该是不必要的。

如果信息存储在包含原子数据的列中,则mutate()等dplyr函数按预期工作。这有效,例如:

library(dplyr)
people_cols <- tibble(name = c("Fiona Foo", "Barry Bar", "Basil Baz"),
                  height_mm = c(1750, 1700, 1800),
                  weight_kg = c(75, 73, 74)) %>%
  mutate(height_inch = height_mm / 25.4)
people_cols
# # A tibble: 3 × 4
#   name          height_mm   weight_kg   height_inch
#   <chr>         <dbl>       <dbl>       <dbl>
# 1 Fiona Foo     1750        75          68.89764
# 2 Barry Bar     1700        73          66.92913
# 3 Basil Baz     1800        74          70.86614

但我想处理S3列表对象中的数据。这是一个玩具示例:

person_stats <- function(name, height_mm, weight_kg) {
  this_person <- structure(list(name = name,
                                height_mm = height_mm,
                                weight_kg = weight_kg),
                           class = "person_stats")
}

fiona <- person_stats("Fiona Foo", 1750, 75)
barry <- person_stats("Barry Bar", 1700, 73)
basil <- person_stats("Basil Baz", 1800, 74)

fiona$height_mm
# [1] 1750

我可以把这些对象放到像这样的tibble列中:

people <- tibble(personstat = list(fiona, barry, basil))

people
# # A tibble: 3 × 1
# personstat
#     <list>
#   1 <S3: person_stats>
#   2 <S3: person_stats>
#   3 <S3: person_stats>

但是如果我尝试在包含这些对象的列上使用mutate(),我会收到错误:

people <- tibble(personstat = list(fiona, barry, basil)) %>%
  mutate(height_inch = personstat$height_mm / 25.4)
# Error in mutate_impl(.data, dots) : object 'personstat' not found

尽量保持尽可能简单 - 如果我甚至可以自己引用命名项目,那么我至少可以将它们放到一个新列中,然后从中对它们执行任何操作:

people <- tibble(personstat = list(fiona, barry, basil)) %>%
  mutate(height_mm = personstat$height_mm)
# Error in mutate_impl(.data, dots) : 
#  Unsupported type NILSXP for column "height_mm"

注意不同的错误,这很有趣 - 它不再抱怨找到列,只是在与命名项目斗争。

我可以使用基本函数cbind()sapply()并使用[[作为函数来使其工作:

people <- tibble(personstat = list(fiona, barry, basil)) %>%
  cbind(height_mm = sapply(.$personstat, '[[', name="height_mm"))

people
#            personstat height_mm
# 1 Fiona Foo, 1750, 75      1750
# 2 Barry Bar, 1700, 73      1700
# 3 Basil Baz, 1800, 74      1800

虽然失去了琐事。

class(people)
# [1] "data.frame"

最后,这让我对此有所帮助,但是感觉就像使用sapply()一样,错过了dplyr mutate()这一点,我认为它应该在没有列的情况下一直工作需要:

people <- tibble(personstat = list(fiona, barry, basil)) %>%
   mutate(height_mm = sapply(.$personstat, '[[', name="height_mm"))
people
# A tibble: 3 x 2
#           personstat height_mm
#               <list>     <dbl>
# 1 <S3: person_stats>      1750
# 2 <S3: person_stats>      1700
# 3 <S3: person_stats>      1800

有没有办法使用mutate()获取上述输出,而不必依赖sapply()之类的内容?或者,确实,从存储在tibble列中的类似列表的S3对象中提取命名值的任何其他明智方法?

2 个答案:

答案 0 :(得分:2)

rowwise可以处理此类情况:

people <- tibble(personstat = list(fiona, barry, basil))

people %>%
    rowwise() %>%
    mutate(height_mm = personstat$height_mm)
# # A tibble: 3 × 2
#           personstat height_mm
# <list>     <dbl>
# 1 <S3: person_stats>      1750
# 2 <S3: person_stats>      1700
# 3 <S3: person_stats>      1800

people %>%
    rowwise() %>%
    mutate(height_inch = personstat$height_mm / 25.4)

# # A tibble: 3 × 2
#           personstat height_inch
# <list>       <dbl>
# 1 <S3: person_stats>    68.89764
# 2 <S3: person_stats>    66.92913
# 3 <S3: person_stats>    70.86614

答案 1 :(得分:1)

如果您希望将其保留在tidyverse中,可以在此使用purrr::map_dbl

library(tidyverse)    
people %>% mutate(height = map_dbl(personstat, "height_mm"))