操纵R中的数据框(由玩家聚合)

时间:2017-07-20 17:58:19

标签: r csv dataframe datatable

我有csv文件,其格式如下:

Player    Sports      Win     Loss
Brian     Football     5       3
Brian     Basketball   4       1
Brian     Bowling      7       0
Chris     Football     3       3
Chris     Basketball   3       4
. . . . 
. . . .

我想将格式更改为以下内容:

Name&Sports   Win         Loss    Total
Brian         16           4       20
Football      5            3       8
Basketball    4            1       5
Bowling       7            0       7
Chris         6            7       13
Football      3            3       6
Basketball    3            4       7   
. . . .
. . . . 

基本上,在新格式中,我们首先写下该人的姓名以及在该人玩过的所有体育比赛中所获得的胜利,损失和比赛的总数。在接下来的行中,我们会记录所玩的人的每项运动,以及在该特定运动中进行的胜利,损失和比赛的总数。一旦我们为那个人写了一切,我们就会转向下一个人并做同样的事情。

在R中有一种简单的方法吗?

2 个答案:

答案 0 :(得分:3)

df <- read.table(text = "Player    Sports      Win     Loss
Brian     Football     5       3
                 Brian     Basketball   4       1
                 Brian     Bowling      7       0
                 Chris     Football     3       3
                 Chris     Basketball   3       4",header=T)

tmp <- aggregate(df$Win,by=list(df$Player),sum)
tmp <- cbind(tmp, aggregate(df$Loss,by=list(df$Player),sum)[2])
names(tmp) <- colnames(df)[2:4]

df <- rbind(df[,2:ncol(df)], tmp)          
df$Total <- df$Loss + df$Win
df
      Sports Win Loss Total
1   Football   5    3     8
2 Basketball   4    1     5
3    Bowling   7    0     7
4   Football   3    3     6
5 Basketball   3    4     7
6      Brian  16    4    20
7      Chris   6    7    13

或者,如果匹配示例中的行顺序很重要:

df <- rbind(tmp[1,], df[1:3,2:ncol(df)], 
            tmp[2,], df[4:nrow(df),2:ncol(df)]) # could easily be made more programmatic          
df$Total <- df$Loss + df$Win
df
       Sports Win Loss Total
1       Brian  16    4    20
2    Football   5    3     8
3  Basketball   4    1     5
4     Bowling   7    0     7
21      Chris   6    7    13
41   Football   3    3     6
5  Basketball   3    4     7

答案 1 :(得分:2)

来自tidyverse的解决方案。 dt_final是最终输出。

# Create example data frame
dt <- read.table(text = "Player    Sports      Win     Loss
Brian     Football     5       3
Brian     Basketball   4       1
Brian     Bowling      7       0
Chris     Football     3       3
Chris     Basketball   3       4",
                 header = TRUE, stringsAsFactors = FALSE)

# Load package
library(tidyverse)

# Split data frame by players
dt_list <- split(dt, f = dt$Player)

# Define a funciton to process data
sum_fun <- function(dt){
  playername <- unique(dt$Player)

  dt1 <- dt %>% 
    mutate(Total = Win + Loss) %>%
    select(-Player) 
  dt2 <- data_frame(Sports = playername,
                    Win = sum(dt1$Win),
                    Loss = sum(dt1$Loss),
                    Total = sum(dt1$Total))
  dt3 <- bind_rows(dt2, dt1)

  return(dt3)
}

# Apply the function
dt_final <- dt_list %>%
  map_df(sum_fun) %>%
  bind_rows() %>%
  rename(`Name&Sports` = Sports)