我有以下数据集:
Date Team 1 Team 2 Team 3 Team 4 Team 5 Team 6
25-Sep-18 17 9 11 14 19 9
24-Sep-18 18 3 2 19 16 5
21-Sep-18 15 11 4 11 9 5
20-Sep-18 1 12 13 18 11 2
19-Sep-18 10 5 6 16 16 13
18-Sep-18 1 13 1 18 5 2
17-Sep-18 16 3 1 13 18 11
14-Sep-18 6 9 18 17 17 1
13-Sep-18 8 4 19 17 4 10
12-Sep-18 6 13 14 6 12 14
11-Sep-18 15 7 9 12 4 3
10-Sep-18 3 11 11 2 5 19
7-Sep-18 1 17 13 9 18 1
我可以对“团队”列进行排名并确定最大值,但是我在创建另一个具有最大值和相应日期的数据框时遇到困难,例如:
Team Name Date Result
Team 1 24-Sep 18
Team 2 7-Sep 17
Team 3 13-Sep 19
Team 4 24-Sep 19
Team 5 25-Sep 19
Team 6 10-Sep 19
我无法从阅读论坛中得出结论,是否最好对它们进行排名,然后使用match函数获取日期,或者我是否应该寻找最大价值的索引位置,然后使用来创建新框架?
(您可能会说,我现在真的迷失了-我确定有比我正在使用的解决方案更简单的解决方案,并希望有人可以指出正确的方向)。
非常感谢您。
答案 0 :(得分:0)
这是基本的R方法:
do.call(rbind, lapply(paste0("Team", 1:6), function(x) {
#for each team x, find the row that has the largest score
n <- which.max(df[,x])
#extract the columns that you want
data.frame(Team=x, Date=df$Date[n], Result=df[n, x])
}))
输出:
Team Date Result
1 Team1 24-Sep-18 18
2 Team2 7-Sep-18 17
3 Team3 13-Sep-18 19
4 Team4 24-Sep-18 19
5 Team5 25-Sep-18 19
6 Team6 10-Sep-18 19
或采用data.table
方法:
library(data.table)
mDT <- melt(setDT(df), id.vars="Date", variable.name="Team", value.name="Result")
mDT[mDT[, .I[which.max(Result)], by=.(Team)]$V1]
输出:
Date Team Result
1: 24-Sep-18 Team1 18
2: 7-Sep-18 Team2 17
3: 13-Sep-18 Team3 19
4: 24-Sep-18 Team4 19
5: 25-Sep-18 Team5 19
6: 10-Sep-18 Team6 19
数据:
df <- read.table(text="Date Team1 Team2 Team3 Team4 Team5 Team6
25-Sep-18 17 9 11 14 19 9
24-Sep-18 18 3 2 19 16 5
21-Sep-18 15 11 4 11 9 5
20-Sep-18 1 12 13 18 11 2
19-Sep-18 10 5 6 16 16 13
18-Sep-18 1 13 1 18 5 2
17-Sep-18 16 3 1 13 18 11
14-Sep-18 6 9 18 17 17 1
13-Sep-18 8 4 19 17 4 10
12-Sep-18 6 13 14 6 12 14
11-Sep-18 15 7 9 12 4 3
10-Sep-18 3 11 11 2 5 19
7-Sep-18 1 17 13 9 18 1", header=TRUE)
答案 1 :(得分:0)
这是一种整洁的方法:
library(tidyverse)
tmp <- data.table::fread(
" Date Team_1 Team_2 Team_3 Team_4 Team_5 Team_6
25-Sep-18 17 9 11 14 19 9
24-Sep-18 18 3 2 19 16 5
21-Sep-18 15 11 4 11 9 5
20-Sep-18 1 12 13 18 11 2
19-Sep-18 10 5 6 16 16 13
18-Sep-18 1 13 1 18 5 2
17-Sep-18 16 3 1 13 18 11
14-Sep-18 6 9 18 17 17 1
13-Sep-18 8 4 19 17 4 10
12-Sep-18 6 13 14 6 12 14
11-Sep-18 15 7 9 12 4 3
10-Sep-18 3 11 11 2 5 19
7-Sep-18 1 17 13 9 18 1"
)
df.tmp <- tmp %>%
mutate(Date = lubridate::as_date(Date,format = "%d-%b-%y",tz="")) %>%
gather(starts_with("Team"),key= "team_name",value = "Results") %>%
group_by(team_name) %>%
top_n(n = 1, wt = Results) %>%
arrange(team_name)
df.tmp
#> # A tibble: 6 x 3
#> # Groups: team_name [6]
#> Date team_name Results
#> <date> <chr> <int>
#> 1 2018-09-24 Team_1 18
#> 2 2018-09-07 Team_2 17
#> 3 2018-09-13 Team_3 19
#> 4 2018-09-24 Team_4 19
#> 5 2018-09-25 Team_5 19
#> 6 2018-09-10 Team_6 19
由reprex package(v0.2.1)于2018-09-27创建
答案 2 :(得分:0)
这是一种data.table
的方法。
library(data.table)
set.seed(1)
# Create fake dataset.
dt <- data.table(Date = paste0("Date", 1:10), Team1 = rnorm(10), Team2 = rnorm(10), Team3 = rnorm(10), Team4 = rnorm(10), Team5 = rnorm(10), Team6 = rnorm(10))
# Change format of fake dataset.
longDT <- melt(dt, id.vars = "Date", variable.name = "Team", value.name = "Result")
# Get the dates with the highest result for each team.
maxDate <- longDT[, list(MaxDate = Date[which.max(Result)]), by = Team]
# Inner join `longDT` and `maxDate` to retrieve the desired output.
want <- merge(longDT, maxDate, by.x = c("Date", "Team"), by.y = c("MaxDate", "Team"))
setorder(want, Team)
setcolorder(want, c("Team", "Date", "Result"))
want
Team Date Result
1: Team1 Date3 1.9220531
2: Team2 Date6 0.7487642
3: Team3 Date3 1.4940476
4: Team4 Date1 2.0749170
5: Team5 Date1 0.9347443
6: Team6 Date2 1.0755934
答案 3 :(得分:0)
这是使用tidyr
的2行单链解决方案-
library(tidyr)
gather(df, key = "Team", value = "value", Team1:Team6) %>%
.[ave(.$value, .$Team, FUN = function(x) x == max(x)) > 0, ]
# Date Team value
# 2 24-Sep-18 Team1 18
# 26 7-Sep-18 Team2 17
# 35 13-Sep-18 Team3 19
# 41 24-Sep-18 Team4 19
# 53 25-Sep-18 Team5 19
# 77 10-Sep-18 Team6 19
数据-
df <- read.table(text="Date Team1 Team2 Team3 Team4 Team5 Team6
25-Sep-18 17 9 11 14 19 9
24-Sep-18 18 3 2 19 16 5
21-Sep-18 15 11 4 11 9 5
20-Sep-18 1 12 13 18 11 2
19-Sep-18 10 5 6 16 16 13
18-Sep-18 1 13 1 18 5 2
17-Sep-18 16 3 1 13 18 11
14-Sep-18 6 9 18 17 17 1
13-Sep-18 8 4 19 17 4 10
12-Sep-18 6 13 14 6 12 14
11-Sep-18 15 7 9 12 4 3
10-Sep-18 3 11 11 2 5 19
7-Sep-18 1 17 13 9 18 1", header=TRUE)