根据另一个数据框展开数据框中的行和添加列

时间:2018-02-24 02:52:00

标签: r list dataframe apply mapply

概述

team.df中的每一行都包含一个NBA teamlist.of.all.stars中的每个数据框都包含多行,具体取决于与每个NBA球队相关联的all star players的数量。

使用apply()系列功能,如何扩展team.df中的行数,以增加每个团队所有明星玩家的数量合并来自list.of.all.stars到最终输出?

我对非apply()方法完全开放,只是想给出一个我希望避免编写循环的例子。

以下是我想要的输出:

#   Team_Name Team_Location         Player Captain
# 1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6  Warriors   Oakland, CA Draymond Green   FALSE

可重复的示例

# create data frame 
# about team information
team.df <-
  data.frame(
    Team_Name       = c( "Cavaliers", "Warriors" )
    , Team_Location = c( "Cleveland, OH", "Oakland, CA")
    , stringsAsFactors = FALSE
  )

# create list about
# all stars on each team
list.of.all.stars <-
  list( 
    data.frame(
      Player = c( "LeBron James", "Kevin Love" )
      , Captain = c( TRUE, FALSE )
      , stringsAsFactors = FALSE
    )
    , data.frame( 
      Player = c( "Stephen Curry", "Kevin Durant"
                  , "Klay Thompson", "Draymond Green"
      )
      , Captain = c( TRUE, FALSE, FALSE, FALSE )
      , stringsAsFactors = FALSE
    )
  )

非apply()系列方法

# cbind each data frame within the list.of.all.stars
# to its corresponding row in team.df
team.and.all.stars.list.of.df <-
  list(
    cbind(
      df[ 1, ]
      , list.of.all.stars[[1]]
    )
    ,   cbind(
      df[ 2, ]
      , list.of.all.stars[[2]]
    )
  )
# Warning messages:
#   1: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded
# 2: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded

# collapse each list
# into data frame
final.df <-
  data.frame(
    do.call(
      what = "rbind"
      , args = team.and.all.stars.list.of.df
    )
    , stringsAsFactors = FALSE
  )
# view final output
final.df
# Team_Name Team_Location         Player Captain
# 1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6  Warriors   Oakland, CA Draymond Green   FALSE

# end of script #

mapply()尝试失败

# Hoping to Apply A Function
# using a data frame and
# a list of data frames
mapply.method <-
  mapply(
    FUN = function( x, y )
      cbind.data.frame(
        x
        , y
        , stringsAsFactors = FALSE
      )
    , team.df
    , list.of.all.stars
  )

# view results
mapply.method
#         Team_Name   Team_Location
# x       Character,2 Character,4  
# Player  Character,2 Character,4  
# Captain Logical,2   Logical,4 

# end of script #

2 个答案:

答案 0 :(得分:3)

考虑到问题的编辑和所需的输出,我会纯粹使用data.table

library(data.table)

## combine the list of all stars into one data.table
## creating an 'id' column 
dt_players <- rbindlist(list.of.all.stars, idcol = T)

## we can keep/use the row names as the order of the data 
## is consistent with the list elements 
dt_teams <- as.data.table(team.df, keep.rownames = T)
dt_teams[, rn := as.integer(rn)]

## use a join to combine the data to get the desired result. 
dt_teams[
  dt_players
  , on = c(rn = ".id")
]

#    rn Team_Name Team_Location         Player Captain
# 1:  1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2:  1 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3:  2  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4:  2  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5:  2  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6:  2  Warriors   Oakland, CA Draymond Green   FALSE

旧答案

此方法使用data.table执行实际工作,但我已经为您提供了sapply方法,用于获取展开team.df数据框的行数。

它还假设team.df中的团队顺序与list.of.all.starts内的玩家顺序一致(即data.frame的行对应列表元素)

library(data.table)

## grab the rows of each data.frame
reps <- sapply(list.of.all.stars, nrow)

## replace the rows of the data.frame
setDT(team.df)[rep(1:.N, reps), ]

#    Team_Name Team_Location
# 1: Cavaliers Cleveland, OH
# 2: Cavaliers Cleveland, OH
# 3:  Warriors   Oakland, CA
# 4:  Warriors   Oakland, CA
# 5:  Warriors   Oakland, CA
# 6:  Warriors   Oakland, CA

如果您不想使用data.table,可以将相同的方法应用于data.frame

team.df[rep(row.names(team.df), reps), ]
#     Team_Name Team_Location
# 1   Cavaliers Cleveland, OH
# 1.1 Cavaliers Cleveland, OH
# 2    Warriors   Oakland, CA
# 2.1  Warriors   Oakland, CA
# 2.2  Warriors   Oakland, CA
# 2.3  Warriors   Oakland, CA

或使用类似的概念,但都在lapply

lst <- lapply(seq_along(list.of.all.stars), function(x) {
  df <- team.df[x, ]
  df[rep(row.names(df), nrow(list.of.all.stars[[x]])), ]
})

do.call(rbind, lst)
#     Team_Name Team_Location
# 1   Cavaliers Cleveland, OH
# 1.1 Cavaliers Cleveland, OH
# 2    Warriors   Oakland, CA
# 2.1  Warriors   Oakland, CA
# 2.2  Warriors   Oakland, CA
# 2.3  Warriors   Oakland, CA

答案 1 :(得分:3)

关于OP在Map/mapply'team.df'中使用'team.df'作为输入的方法是data.frame,这是list列。因此,基本输入是vector列。它遍历vector或列而不是整个数据集或行(基于所需的输出)。为了防止这种情况,如果我们用list换行,它就是一个单元,它会循环到'list.of.all.stars'的每个list元素

do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))

根据预期的输出,'team.df'的每一行都应该有'list.of.all.stars'的相应list元素。在这种情况下,行split'team.df'并执行cbind

res <- do.call(rbind, Map(cbind,  split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
#   Team_Name Team_Location         Player Captain
#1 Cavaliers Cleveland, OH   LeBron James    TRUE
#2 Cavaliers Cleveland, OH     Kevin Love   FALSE
#3  Warriors   Oakland, CA  Stephen Curry    TRUE
#4  Warriors   Oakland, CA   Kevin Durant   FALSE
#5  Warriors   Oakland, CA  Klay Thompson   FALSE
#6  Warriors   Oakland, CA Draymond Green   FALSE

我们也可以在tidyverse中执行此操作。在对'team.df'中的所有列进行分组后,nest将其创建一个'数据'的基本列表(长度为2),将'data'分配给'list.of.all.stars 'mutateunnest list

library(tidyverse)
team.df %>% 
      group_by_all() %>%
      nest %>% 
      mutate(data = list.of.all.stars) %>% 
      unnest
# A tibble: 6 x 4
#  Team_Name Team_Location Player         Captain
#  <chr>     <chr>         <chr>          <lgl>  
# 1 Cavaliers Cleveland, OH LeBron James   T      
# 2 Cavaliers Cleveland, OH Kevin Love     F      
# 3 Warriors  Oakland, CA   Stephen Curry  T      
# 4 Warriors  Oakland, CA   Kevin Durant   F      
# 5 Warriors  Oakland, CA   Klay Thompson  F      
# 6 Warriors  Oakland, CA   Draymond Green F