使用ggplot使用来自一个文件的数据绘制网络,使用另一个

时间:2017-10-18 09:49:32

标签: r ggplot2

我最有可能使用简单的解决方案。我有两个数据帧。第一个是简单的边缘列表,其权重如下所示:

head(merge_allwinsloss_df)

winner loser weight
1    CAL   HAW     20
2   TENN   APP      7
3    LOU  CHAR     56
4    CMU   PRE     46
5   WAKE  TULN      4
6    CIN   UTM     21

,第二个是提供分组的文件(以大学橄榄球会议的形式),如下所示:

 short conference
1   TEM        AAC
2   USF        AAC
3   UCF        AAC
4   CIN        AAC
5   ECU        AAC
6  CONN        AAC

我想要做的是创建一个情节(最好使用ggplot),使用有向图(从赢家到输家),对边缘进行加权(通过重量)并在同一个会议中对团队进行着色和会议的颜色代码节点。下面的代码是" start"但我并没有真正到达任何地方。

ggplot(data = merge_allwinsloss_df, aes(from_id = winner, to_id = loser)) +
  geom_net(aes(color = all_teams_by_conference_df), layout.alg = "fruchtermanreingold", 
           size = 2, labelon = TRUE, vjust = -0.6, ecolour = "grey80",
           directed = TRUE, fontsize = 3, ealpha = 0.5) +
  scale_color_brewer("Conference",
                     palette = "Paired") +
  xlim(c(-0.05, 1.05)) +
  theme_net() +
  theme(legend.position = "bottom")

我融化了数据,但这也导致了许多其他问题,这些问题主要与丢失映射或无法弄清楚如何正确地通过会议标记merge_allwinsloss_df中的团队相关联。对不起,如果这不是明确的话。我一直在寻找帮助并且在我的脑子里挣扎了几天,所以任何帮助都会受到极大的赞赏。提前谢谢。

更新:这是一个最小的例子。

#Create a list of CFB winners and losers with weight given by point differential
merge_allwinsloss_ALT_df <- data.frame(matrix(c("CAL","HAW", 12, "TENN", "APP", 7, "LOU", "CHAR", 56, 
                                  "CMU", "HAW", 0, "WVU", "APP", 20 , "ARK", "TENN", 6, "CMU", "WVU", 7,
                                  "WVU", "JMU", 15, "IND", "MIN", 3, "IND", "HAW", 14, "FSU", "TCU", 2, 
                                  "TCU", "ARK", 14),
            nrow=12,ncol=3,byrow=TRUE))
colnames(merge_allwinsloss_ALT_df) <- c("winner", "loser", "weight")
merge_allwinsloss_ALT_df

#Create a list of CFB teams with conference associations
all_teams_by_conference_ALT_df<- data.frame(matrix(c("CAL","PAC", "HAW", "MAC", "TENN", "SEC", 
                                                     "APP", "SUN BELT", "LOU", "ACC", "CHAR", "FCS", 
                                                "CMU", "MAC", "WVU", "BIG 12", "ARK", "SEC", "JMU", "FCS",
                                                "IND", "BIG 10", "MIN", "BIG 10", "FSU", "ACC", "TCU",
                                                "BIG 12"),
                                              nrow=14,ncol=2,byrow=TRUE))
colnames(all_teams_by_conference_ALT_df) <- c("team", "conference")
all_teams_by_conference_ALT_df

# (attempt to) Plot the two data files using the first as the nodes and the # second as a reference file for coloring by conference. 

ggplot(data = merge_allwinsloss_ALT_df, aes(from_id = winner, to_id = loser)) +
  geom_net(aes(color = all_teams_by_conference_ALT_df), layout.alg = "fruchtermanreingold", 
           size = 2, labelon = TRUE, vjust = -0.6, ecolour = "grey80",
           directed = TRUE, fontsize = 3, ealpha = 0.5) +
  scale_color_brewer("Conference",
                     palette = "Paired") +
  xlim(c(-0.05, 1.05)) +
  theme_net() +
  theme(legend.position = "bottom")

我意识到有些事情在这里,但我无法弄明白。此外,我想设置它,以便(a)同一会议中相互对战的所有队伍共享其边缘的共同颜色,(b)使用权重列对边缘进行加权。 merge_allwinsloss_df_ALT

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

您需要将两个表连接在一起,以便它们都在一个数据框中。

要添加获胜者的会议,请执行以下操作:

df1 <- merge(merge_allwinsloss_ALT_df,all_teams_by_conference_ALT_df, 
  by.x="winner",by.y="team",all.x=T)

捕捉输赢球队和#39;会议,然后我将df1$conference重命名为&#34; conference_winner&#34;,然后使用df1再次执行相同的合并,by.x="loser"

另外,我建议您尝试为数据框使用较短的名称。一遍又一遍地输入merge_allwinsloss_ALT_df是没有意义的。此外,merge是一个函数,因此当您在名称中使用它时会产生混淆来解决问题(请参阅上面我的代码merge(merge...),因为您的命名约定)。

之后,您只需将color和/或fill映射到conference_winnerconference_loser