Question

我在理解如何使用多个ID列和Value列扩展数据方面有些麻烦。下面是我正在使用的代码的摘要。数据框中的每一行对应于在特定时间在“团队”列下标记的一个团队的表现。我的主要目标是使每支球队的统计数据与他们的对手保持一致。

以下是重新创建df的代码：

df <- tribble( 
  ~MatchDate, ~"H/A", ~Team, ~Opponent, ~AvgScorePer3, ~AvgPointsPer3, ~AvgStrikesPer3, 
  "01/01/2020", "H", "Team 1", "Team 2", 3, 6, 10, 
  "02/01/2020", "A", "Team 1", "Team 3", 4, 7, 11, 
  "03/01/2020", "H", "Team 1", "Team 4", 5, 8, 14, 
  "01/01/2020", "H", "Team 2", "Team 1", 4, 10, 10,
  "02/02/2020", "H", "Team 2", "Team 4", 5, 7, 9, 
  "01/01/2020", "A", "Team 3", "Team 5", 4, 4, 7, 
  "02/01/2020", "A", "Team 3", "Team 1", 2, 3, 4, 
  "02/01/2020", "H", "Team 4", "Team 2", 3, 2, 3,
  "03/01/2020", "H", "Team 4", "Team 1", 4, 3, 5, 
  "01/01/2020", "A", "Team 5", "Team 3", 2, 6, 2
  )

下面的代码是我要实现的目标的示例，这将使我能够计算关键统计数据之间的差异。

df <- tribble (
  ~MatchDate, ~"H/A", ~Team, ~Opponent, ~AvgScorePer3Team, ~AvgPointsPer3Team, 
  ~AvgStrikesPer3Team, ~AvgScorePer3Opponent, ~AvgPointsPer3Opponent, ~AvgStrikesPer3Opponent, 
  "01/01/2020", "H", "Team 1", "Team 2", 3, 6, 10, 4, 10, 10,
  "02/01/2020", "A", "Team 1", "Team 3", 4, 7, 11, 2, 3, 4,
  "03/01/2020", "H", "Team 1", "Team 4", 5, 8, 14, 4, 3, 5,
  "01/01/2020", "H", "Team 2", "Team 1", 4, 10, 10, 3, 6, 10,
  "02/02/2020", "H", "Team 2", "Team 4", 5, 7, 9, 3, 2, 3, 
  "01/01/2020", "A", "Team 3", "Team 5", 4, 4, 7, 2, 6, 2,
  "02/01/2020", "A", "Team 3", "Team 1", 2, 3, 4, 4, 7, 11,
  "02/01/2020", "H", "Team 4", "Team 2", 3, 2, 3, 5, 7, 9, 
  "03/01/2020", "H", "Team 4", "Team 1", 4, 3, 5, 5, 8, 14, 
  "01/01/2020", "A", "Team 5", "Team 3", 2, 6, 2, 4, 4, 7 
)

到目前为止，我已经查看了pivot_wider，整形和dcast，但无法产生正确的结果。随着新列采用现有团队的名称，我最终得到的可变列要比我期望的要多得多！

Answer 1

您正在描述自我加入。

left_join(
  df,
  df,
  by = c("MatchDate" = "MatchDate", "Team" = "Opponent", "Opponent" = "Team"),
  suffix = c("Team", "Opponent")
)

在您的示例中，我认为第4队和第2队之间的比赛在最高纪录和最低纪录之间的日期不一致。

扩展具有多个变量和id列的数据框

1 个答案: