我有两个包含相关数据的数据框。这与NFL有关。一个df有玩家名字和按周接收目标(玩家df):
Player Tm Position 1 2 3 4 5 6
1 A.J. Green CIN WR 13 8 11 12 8 10
2 Aaron Burbridge SFO WR 0 1 0 2 0 0
3 Aaron Ripkowski GNB RB 0 0 0 0 0 1
4 Adam Humphries TAM WR 5 8 12 4 2 0
5 Adam Thielen MIN WR 5 5 4 3 8 0
6 Adrian Peterson MIN RB 2 3 0 0 0 0
另一个数据框有每周由团队总结的接收目标(团队df):
Tm `1` `2` `3` `4` `5` `6`
<fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ARI 37 35 50 45 26 35
2 ATL 38 34 30 37 28 41
3 BAL 32 45 40 51 47 48
4 BUF 22 30 20 33 20 26
5 CAR 31 39 36 47 28 46
6 CHI 28 29 45 36 41 49
7 CIN 30 54 28 31 39 31
8 CLE 26 33 38 38 35 42
9 DAL 43 30 24 32 24 27
10 DEN 26 32 35 31 34 47
# ... with 22 more rows
我要做的是按周创建另一个包含目标百分比的数据框。所以我需要匹配来自&#34; Tm&#34;播放器df中的列和周列标题(1-6)。
我已经弄清楚如何通过合并它们然后创建新行来实现这一点,但是当我添加更多数据(周)时,我需要编写更多代码:
a <- merge(playertgt, teamtgt, by="Tm") #merges the two
a$Wk1 <- a$`1.x` / a$`1.y`
a$Wk2 <- a$`2.x` / a$`2.y`
a$Wk3 <- a$`3.x` / a$`3.y`
所以我正在寻找的是一个很好的方法,这样做会自动更新,并且不会让我必须创建一个df,其中包含一些我不需要的列,并且会更新新的一周,我将它们添加到我的源数据中。
如果在其他地方得到了回答,我道歉,但我一直在寻找一个好方法来做这一天,我找不到它。在此先感谢您的帮助!
答案 0 :(得分:2)
您可以使用dplyr
:
library(dplyr)
## Do a left outer join to match each player with total team targets
a <- left_join(playertgt,teamtgt, by="Tm")
## Compute percentage over all weeks selecting player columns ending with ".x"
## and dividing by corresponding team columns ending with ".y"
tgt.pct <- select(a,ends_with(".x")) / select(a,ends_with(".y"))
## set the column names to week + number
colnames(tgt.pct) <- paste0("week",seq_len(ncol(teamtgt)-1))
## construct the output data frame adding back the player and team columns
tgt.pct <- data.frame(Player=playertgt$Player,Tm=playertgt$Tm,tgt.pct)
显然,为了方便dplyr
选择加入后的列,我只使用ends_with
。使用grepl
进行此选择的base-R方法是:
a <- merge(playertgt, teamtgt, by="Tm", all.x=TRUE)
tgt.pct <- subset(a,select=grepl(".x$",colnames(a))) / subset(a,select=grepl(".y$",colnames(a)))
colnames(tgt.pct) <- paste0("week",seq_len(ncol(teamtgt)-1))
tgt.pct <- data.frame(Player=playertgt$Player,Tm=playertgt$Tm,tgt.pct)
数据:使用有限的发布数据,只有AJ Green会计算出目标百分比:
playertgt <- structure(list(Player = structure(1:6, .Label = c("A.J. Green",
"Aaron Burbridge", "Aaron Ripkowski", "Adam Humphries", "Adam Thielen",
"Adrian Peterson"), class = "factor"), Tm = structure(c(1L, 4L,
2L, 5L, 3L, 3L), .Label = c("CIN", "GNB", "MIN", "SFO", "TAM"
), class = "factor"), Position = structure(c(2L, 2L, 1L, 2L,
2L, 1L), .Label = c("RB", "WR"), class = "factor"), X1 = c(13L,
0L, 0L, 5L, 5L, 2L), X2 = c(8L, 1L, 0L, 8L, 5L, 3L), X3 = c(11L,
0L, 0L, 12L, 4L, 0L), X4 = c(12L, 2L, 0L, 4L, 3L, 0L), X5 = c(8L,
0L, 0L, 2L, 8L, 0L), X6 = c(10L, 0L, 1L, 0L, 0L, 0L)), .Names = c("Player",
"Tm", "Position", "X1", "X2", "X3", "X4", "X5", "X6"), class = "data.frame", row.names = c(NA,
-6L))
## Player Tm Position X1 X2 X3 X4 X5 X6
##1 A.J. Green CIN WR 13 8 11 12 8 10
##2 Aaron Burbridge SFO WR 0 1 0 2 0 0
##3 Aaron Ripkowski GNB RB 0 0 0 0 0 1
##4 Adam Humphries TAM WR 5 8 12 4 2 0
##5 Adam Thielen MIN WR 5 5 4 3 8 0
##6 Adrian Peterson MIN RB 2 3 0 0 0 0
teamtgt <- structure(list(Tm = structure(1:10, .Label = c("ARI", "ATL",
"BAL", "BUF", "CAR", "CHI", "CIN", "CLE", "DAL", "DEN"), class = "factor"),
X1 = c(37L, 38L, 32L, 22L, 31L, 28L, 30L, 26L, 43L, 26L),
X2 = c(35L, 34L, 45L, 30L, 39L, 29L, 54L, 33L, 30L, 32L),
X3 = c(50L, 30L, 40L, 20L, 36L, 45L, 28L, 38L, 24L, 35L),
X4 = c(45L, 37L, 51L, 33L, 47L, 36L, 31L, 38L, 32L, 31L),
X5 = c(26L, 28L, 47L, 20L, 28L, 41L, 39L, 35L, 24L, 34L),
X6 = c(35L, 41L, 48L, 26L, 46L, 49L, 31L, 42L, 27L, 47L)), .Names = c("Tm",
"X1", "X2", "X3", "X4", "X5", "X6"), class = "data.frame", row.names = c(NA,
-10L))
## Tm X1 X2 X3 X4 X5 X6
##1 ARI 37 35 50 45 26 35
##2 ATL 38 34 30 37 28 41
##3 BAL 32 45 40 51 47 48
##4 BUF 22 30 20 33 20 26
##5 CAR 31 39 36 47 28 46
##6 CHI 28 29 45 36 41 49
##7 CIN 30 54 28 31 39 31
##8 CLE 26 33 38 38 35 42
##9 DAL 43 30 24 32 24 27
##10 DEN 26 32 35 31 34 47
结果是:
## Player Tm week1 week2 week3 week4 week5 week6
##1 A.J. Green CIN 0.4333333 0.1481481 0.3928571 0.3870968 0.2051282 0.3225806
##2 Aaron Burbridge SFO NA NA NA NA NA NA
##3 Aaron Ripkowski GNB NA NA NA NA NA NA
##4 Adam Humphries TAM NA NA NA NA NA NA
##5 Adam Thielen MIN NA NA NA NA NA NA
##6 Adrian Peterson MIN NA NA NA NA NA NA
答案 1 :(得分:2)
如果你下次提供一些数据会更好,这会让生活变得更容易。
我认为重点是您的数据结构。我认为你必须把你的数据放到一个很长的格式(关键字是整齐的数据,我猜)。我编写了一些数据,希望我能正确理解你的问题。
library(tidyr)
library(dplyr)
player_df = data.frame(team = c('ARI', 'BAL', 'BAL', 'CLE', 'CLE'),
player =c('A', 'B', 'C', 'D', 'F'),
'1' = floor(runif(5, min=1, max=2)*10),
'2' = floor(runif(5, min=1, max=2)*10))
> player_df
team player X1 X2
1 ARI A 15 10
2 BAL B 16 15
3 BAL C 13 11
4 CLE D 14 19
5 CLE F 12 14
team_df = data.frame(team = c('ARI', 'BAL', 'CLE'),
'1' = floor(runif(3, min=10, max=20)*20),
'2' = floor(runif(3, min=10, max=20)*20))
> team_df
team X1 X2
1 ARI 281 205
2 BAL 362 309
3 CLE 323 238
现在,将两个数据帧放入长格式:
player_df = gather(player_df, week, player_value, -team, -player)
team_df = gather(team_df, week, team_value, -team)
> player_df
team player week player_value
1 ARI A X1 15
2 BAL B X1 16
3 BAL C X1 13
4 CLE D X1 14
5 CLE F X1 12
6 ARI A X2 10
7 BAL B X2 15
8 BAL C X2 11
9 CLE D X2 19
10 CLE F X2 14
> team_df
team week team_value
1 ARI X1 281
2 BAL X1 362
3 CLE X1 323
4 ARI X2 205
5 BAL X2 309
6 CLE X2 238
现在,将它们连接(或合并)在一起。默认情况下,inner_join
会加入常用列名。
join_db = inner_join(player_df, team_df)
> join_db
team player week player_value team_value
1 ARI A X1 15 281
2 BAL B X1 16 362
3 BAL C X1 13 362
4 CLE D X1 14 323
5 CLE F X1 12 323
6 ARI A X2 10 205
7 BAL B X2 15 309
8 BAL C X2 11 309
9 CLE D X2 19 238
10 CLE F X2 14 238
我认为在那种格式中你可以做更多。
HTH
的Stefan