如何使用seaborn或matplotlib绘制热图?

时间:2019-06-05 17:56:37

标签: python dataframe matplotlib seaborn heatmap

我有一个要可视化为热图的数据框,我用matplotlib制作了一个热图,但它显示的数据与我的数据框不相干。

我尝试从网上找到的示例中使用matplotlib创建热图,并更改了适用于我的数据的代码。但是在图表的左侧和顶部,有一些随机值不在我的数据中,我不确定如何删除它们。

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from io import StringIO

url = 'http://mcubed.net/ncaab/seeds.shtml'

#Getting the website text
data = requests.get(url).text

#Parsing the website
soup = BeautifulSoup(data, "html5lib")

#Create an empty list
dflist = []

#If we look at the html, we don't want the tag b, but whats next to it
#StringIO(b.next.next), takes the correct text and makes it readable to 
pandas
for b in soup.findAll({"b"})[2:-1]:
    dflist.append(pd.read_csv(StringIO(b.next.next), sep = r'\s+', header 
= None))

dflist[0]

#Created a new list, due to the melt we are going to do not been able to 
replace
#the dataframes in DFList
meltedDF = []

#The second item in the loop is the team number starting from 1
for df, teamnumber in zip(dflist, (np.arange(len(dflist))+1)):

    #Creating the team name
    name = "Team " + str(teamnumber)

    #Making the team name a column, with the values in df[0] and df[1] in 
our dataframes
    df[name] = df[0] + df[1]

    #Melting the dataframe to make the team name its own column
    meltedDF.append(df.melt(id_vars = [0, 1, 2, 3]))

# Concat all the melted DataFrames
allTeamStats = pd.concat(meltedDF)

# Final cleaning of our new single DataFrame
allTeamStats = allTeamStats.rename(columns = {0:name, 2:'Record', 3:'Win 
Percent', 'variable':'Team' , 'value': 'VS'})\
                           .reindex(['Team', 'VS', 'Record', 'Win 
Percent'], axis = 1)

allTeamStats
#Graph visualization Making a HeatMap
%matplotlib inline
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
y=["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16"]
x=["16","15","14","13","12","11","10","9","8","7","6","5","4","3","2","1"]
winp = []
for i in x:
    lst = []
    for j in y:
        percent = allTeamStats.loc[(allTeamStats["Team"]== 'Team '+i) &\
                                    (allTeamStats["VS"]== "vs.#"+j)]['Win 
Percent'].iloc[0]
        percent = float(percent[:-1])
        lst.append(percent)
    winp.append(lst)
winpercentage= np.array([[]])

fig,ax=plt.subplots(figsize=(18,18))
im= ax.imshow(winp, cmap='hot')
# We want to show all ticks...
ax.set_xticks(np.arange(len(y)))
ax.set_yticks(np.arange(len(x)))

# ... and label them with the respective list entries
ax.set_xticklabels(y)
ax.set_yticklabels(x)

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")

#  Loop over data dimensions and create text annotations.
for i in range(len(x)):
    for j in range(len(y)):
         text = ax.text(j, i, winp[i][j],
                        ha="center", va="center", color="red")

ax.set_title("Win Percentage of Each Matchup", fontsize= 40)
heatmap = plt.pcolor(winp)
plt.colorbar(heatmap)
ax.set_ylabel('Seeds', fontsize=40)
ax.set_xlabel('Seeds', fontsize=40)
plt.show()

除了热图左侧和顶部的两行外,我得到的结果是我想要的。我不确定这些值来自何处,为了更容易看到它们,我使用了cmap ='hot'来显示不应存在的值。如果可以帮助我修复代码以正确地绘制代码,或使用seaborn绘制整个新的热图(我的TA告诉我尝试使用seaborn,但我从未使用过)与数据一起使用。什么都可以帮助谢谢!

1 个答案:

答案 0 :(得分:0)

我认为罪魁祸首是此行:代码中的im= ax.imshow(winp, cmap='hot')。删除它,然后再试一次。基本上,在那条线之后绘制的任何内容都将覆盖在该线所创建的内容之上。左侧和顶部的“边距”是图像底部唯一可见的部分。