通过使用行中的公共列值合并现有数据框的两行来构造新数据框

时间:2017-05-31 04:20:12

标签: r dataframe

https://www.dropbox.com/s/prqiojwzpax339z/Test123.xlsx?dl=0

该链接包含一个xlsx文件,其中包含一个击球手击球的详细信息,其中记录了他在测试比赛中在每个局中得分的比赛。因此,行的详细信息包含两行之间的某些列的相同值因为在一场测试比赛中,击球手有机会在两局中击球,所以当我们比较测试比赛的两行时,反对,Ground,StartDateAscending,MatchNumber,Result等列中提到的细节将是常见的。

问题:那么我们如何根据这些匹配值对行中存在的数据进行分类,并创建一个包含合并行的新数据框。

Ex:在通过链接共享的数据中,我将前两行作为样本来说明我想要实现的内容,以下是使用r函数

structure(list(Runs = c("10", "27"), Mins = c("30", "93"), BF = c("19", 
"65"), X4s = c("1", "4"), X6s = c("0", "0"), SR = c("52.63", 
"41.53"), Pos = c("6", "6"), Dismissal = c("bowled", "caught"
), Inns = c(2, 4), Opposition = c("v England", "v England"), 
    Ground = c("Lord's", "Lord's"), Start.DateAscending = structure(c(648930600, 
    648930600), class = c("POSIXct", "POSIXt"), tzone = ""), 
    Match.Number = c("Test # 1148", "Test # 1148"), Result = c("Loss", 
    "Loss")), .Names = c("Runs", "Mins", "BF", "X4s", "X6s", 
"SR", "Pos", "Dismissal", "Inns", "Opposition", "Ground", "Start.DateAscending", 
"Match.Number", "Result"), row.names = 1:2, class = "data.frame")

来自上述块的数据如下所示:

  Runs Mins BF X4s X6s    SR Pos Dismissal Inns Opposition Ground
1   10   30 19   1   0 52.63   6    bowled    2  v England Lord's
2   27   93 65   4   0 41.53   6    caught    4  v England Lord's
  Start.DateAscending Match.Number Result
1          1990-07-26  Test # 1148   Loss
2          1990-07-26  Test # 1148   Loss

所以我想要实现的是根据常见列值(如Match.Number,Opposition,Ground,Start.DateAscending)总结运行列值。

我希望下面的值会存储在一个新的数据框中

 Runs   Opposition  Ground Start.DateAscending Match.Number Result
1   37     v England Lord's 1990-07-26          Test # 1148   Loss

1 个答案:

答案 0 :(得分:1)

我们在调整'运行后使用aggregate对数据集的列进行子集化。到numeric班级

colsofinterest <- names(df1)[c(1, 10:ncol(df1))]
aggregate(Runs~., df1[colsofinterest], sum)
#  Opposition Ground Start.DateAscending Match.Number Result Runs
#1  v England Lord's          1990-07-26  Test # 1148   Loss   37

或者我们可以使用tidyverse

colsofinterest2 <- names(df1)[10:ncol(df1)]
library(dplyr)
df1 %>%
    group_by_(.dots = colsofinterest2) %>%
    summarise(Runs = sum(Runs))
# A tibble: 1 x 6
# Groups: Opposition, Ground, Start.DateAscending, Match.Number [?]
#  Opposition Ground Start.DateAscending Match.Number Result  Runs
#       <chr>  <chr>              <dttm>        <chr>  <chr> <int>
#1  v England Lord's          1990-07-26  Test # 1148   Loss    37